Previous Chapter: Getting Oriented
Structured Knowledge & Simple Queries¶
In the last chapter, we learned how we can think of the Atomspace as a database of documents. Each Atom is like a document that can contain a set of key-value pairs. But document-store database solutions are already plentiful and the value of the Atomspace comes from other capabilities.
So now we’ll begin to explore the Atomspace as a knowledge representation format, or Knowledge Base; Often abbreviated as KB.
KBs hold a collection of concepts and define them in relationship to other concepts. KBs are a relatively old idea, not unique to the Atomspace, and there is a Wikipedia page covering the basics: https://en.wikipedia.org/wiki/Knowledge_base
This guide and nearly every other piece of documentation on KBs I’ve encountered describe the primitives of knowledge representation using concrete examples from the real world, such as “A dog is an animal.” This is helpful for learning the particulars of the KB because the fundamental conceptual relationships are already familiar to most people. However, I believe a large amount of the value of KBs and ontologies come from the ability to reason within an abstract systems of precise relationships where that precision has been imposed elsewhere, at another level, or where ambiguity never existed to begin with. For example, a KB could be used to reason about the behavior of a computer program, given a hypothetical set of inputs.
In general, strictly regular KBs have many limitations when representing data from real world. For example, the InheritanceLink documentation points out the difference between Extensional vs. Intentional Inheritance. I’d argue this is just the tip of the iceberg when trying to formalize knowledge about the fuzzy real world into a crisp ontology. That being said, the Atomspace is better than most other KB formats, because of the tools it provides for expressing nuance, partial truth and uncertainty. All the same, in my opinion, a symbolic KB alone is not sufficient for many aspects of real-world knowledge representation. Regardless, I’ll attempt to refrain from injecting too much of my own personal opinions and conclusions into this guide dedicated to understanding the Atomspace.
Ultimately, any formal language is only as precise as the axioms and definitions it is built upon, and you will have to define your own meanings and maybe even your own grammar. Personally I view the Atomspace less as a language and instead as the building-blocks out of which a language can be created. Some people have used the Atomspace to represent tokens and grammar from a natural language such as English, while others have used it to represent interactions between proteins and genes. In its purest form, the Atomspace is a system where data and rules about how the data can interact can be described side-by-side, and then queried and simulated.
PredicateNode & Links to make statements¶
We touched on PredicateNode
in the last chapter when we used them as keys for values we associated with other atoms.
In grammar, a statement is divided into subject(s) and a predicate. “The dog barks.” ‘Barks” is the predicate.
In the statement: “The dog is happy”, “is happy” is the predicate.
It’s just like that SchoolHouse Rock video <https://www.youtube.com/watch?v=CLV3eMvW73g> would sing. The video is unlikely to further elucidate anything at all, but if you’re a child of the 80s like me, it’s a fun diversion down memory lane.
The important thing about predicates is that they allow assertions to be made. Concepts by themselves don’t say anything, they merely exist (or don’t exist), but predicates allow for statements.
In the Atomspace, a PredicateNode
provides a label for these predicate concepts.
A more formal description of a PredicateNode
in the Atomspace is here: https://wiki.opencog.org/w/PredicateNode Formally, a PredicateNode
is a node that can be evaluated to create a TruthValue. In the next chapter we’ll cover exactly how predicates are evaluated, but for now it’s ok if this idea is a little vague.
As seen in previous chapter, the below Scheme snippet is one possible way to tell the Atomspace that “Fido the Dog’s weight is 12.5kg”.
(cog-set-value! (Concept "Fido the Dog")
(Predicate "weight_in_kg") (FloatValue 12.5))
Now, I can retrieve the weight_in_kg
using the cog-value
command, as demonstrated previously, but say I want to search for all dogs in the Atomspace whose weight is under 15kg.
Values associated with atoms are not indexed, so a query like that would be inefficient.
If I have the atom, I can get a value attached to it in constant time, but if I am looking for atoms with an associated value that meets some criteria, this is generally a bad idea as the time required for the search grows with the number of atoms being examined.
Here is another way to express the statement “Fido the Dog’s weight in kg is 12.5”.
(StateLink
(ListLink
(Concept "Fido the Dog")
(Predicate "weight_in_kg")
)
(NumberNode 12.5)
)
Let’s break down exactly what we just did. We created two new link atoms and a new NumberNode
node atom.
Here is a graphical representation of the atom relationships we just expressed.
In the inner-part of the expression, we created a ListLink
atom that references (Concept "Fido the Dog")
and (Predicate "weight_in_kg")
.
This ListLink
is just a simple association between the other two node atoms.
The formal description of ListLink
tells us we should think of this as an argument list, and from a programming language perspective this makes sense.
Personally, however, I prefer to think about it from a natural language perspective.
By definition, this particular ListLink
atom is the ListLink
atom that references (Concept "Fido the Dog")
and (Predicate "weight_in_kg")
in that order.
Therefore, in this context, you can think of it as the atom that means “Fido the Dog’s weight in kg”.
Basically, a single atom, i.e. our new link atom, is able to represent a compound concept created by combining two other atoms.
The documentation for ListLink
is here: https://wiki.opencog.org/w/ListLink, if you want to understand it more precisely.
Moving on, the outer part of the expression creates a StateLink
. The StateLink
atom that we just made references our newly-created ListLink
and a newly-created NumberNode
that has the “label” of “12.5”.
A StateLink
is like a ListLink
insofar as it also references other atoms and provides a way to reference this newly combined concept as an atom itself.
The main feature of a StateLink
is that there can be only one StateLink
for each referant in position 0 (Zero) of the StateLink
’s outbound set.
So, referring back to our example, “Fido the Dog’s weight in kg” can only have one StateLink
that points to it as the link’s first referenced atom.
In plain English, “Fido the Dog’s weight in kg” can only be one thing at a time. His weight can’t simultaneously be 12.5kg and 15kg. Setting it to 15kg will update the StateLink
atom that’s already there, rather than creating a new StateLink
atom.
The documentation for StateLink
is here: https://wiki.opencog.org/w/StateLink.
In addition, more documentation and examples along these lines can be found in these OpenCog examples: https://github.com/opencog/atomspace/blob/master/examples/atomspace/state.scm & https://github.com/opencog/atomspace/blob/master/examples/atomspace/property.scm
Executing Atoms¶
Atoms in the Atomspace can represent both data as well as the transformations and operations that can be done to the data. The code and the data exist side-by-side. From the perspective of the Atomspace, they’re all just atoms.
We just saw how we can use a link atom to create a compound concept, i.e. “Fido the Dog’s weight_in_kg”. Take a look at another compound concept formed with a link:
(PlusLink
(NumberNode 2)
(NumberNode 3)
)
In English, those 3 atoms would be interpreted as the sentence fragment “The sum of 2 and 3”. If you paste the above Scheme snippet into the Guile interpreter, it just puts those 3 atoms into the Atomspace. Boring!
But PlusLink
has a special property; it is an Active or “executable” atom type.
So far, the atoms we’ve seen, like the ListLink
we used above, have been declarative, but Active atoms can be executed.
Executing an atom means some computational operation is performed. The behavior varies from one atom type to another, and the effects can range from synthesizing a new Value, creating new atoms in the Atomspace or even deleting existing atoms.
Some Link types may be Active as well as declarative, and which operation occurs depends on the context in which the link is accessed.
We execute an atom with the cog-execute!
OpenCog function call.
(cog-execute!
(PlusLink
(NumberNode 2)
(NumberNode 3)
)
)
If you just ran the Scheme snippet above, you probably noticed that it returned (NumberNode 5)
. And if you were being very thorough, you also may have also noticed that the (NumberNode 5) atom was created and added to the Atomspace.
When the output of cog-execute!
is an atom, it will be added to the Atomspace.
Remember, Atoms can’t exist outside the Atomspace, so even atoms that are created for a temporary operation are added to the Atomspace and remain there until something explicitly removes them. Sometimes this is desireable. Sometimes this is annoying. For now, it’s just something to be aware of.
A Basic Query with MeetLink & VariableNode¶
Back to Fido the Dog. Now that we’ve told the Atomspace that “Fido the Dog’s weight in kg is 12.5”, how can we retrieve that information? How do we ask “What is Fido the Dog’s weight in kg?”
Like this:
(cog-execute!
(MeetLink
(StateLink
(ListLink
(Concept "Fido the Dog")
(Predicate "weight_in_kg")
)
(VariableNode "$v1")
)
)
)
We’ll go through what we just did, step by step. But first, I want to rewrite the above statement so our code can be a little less verbose and we can focus on what really matters.
(define fidos_weight_link (List
(Concept "Fido the Dog")
(Predicate "weight_in_kg")))
Since Fido’s weight is something we’re referencing often, we can use Scheme’s define
feature to create a single token to refer to it.
Now our query looks like this:
(cog-execute!
(Meet
(State
fidos_weight_link
(Variable "$v1")
)
)
)
Just like we abbreviated ConceptNode
and PredicateNode
earlier, we can abbreviate ListLink
as just List
and StateLink
as State
.
Now that I’ve introduced them, I’ll also start abbreviating MeetLink
as Meet
, VariableNode
as Variable
, etc. You get the idea, so I won’t explicitly explain abbreviations from here onward.
Anyway, let’s get to the meat of what we just did. (No pun! I swear it.) MeetLink
is one of the Active, aka executable, link types.
Executing a MeetLink
performs a query in the Atomspace, and returns the atoms found by the query.
Let’s look at the atom that our MeetLink
is referencing. This atom is our query:
(State
fidos_weight_link
(Variable "$v1")
)
This can be thought of as a “Match Expression”, because executing the MeetLink
will search the Atomspace for all atoms that match this atom we provided.
The VariableNode
can then be thought of as the wildcard. The wildcard can match any other atom.
If you are familiar with Regular Expressions, this is the same principle.
So, you might interpret this query expression as saying “Find all the StateLink
atoms that connect fidos_weight_link
to something.
What are all the somethings that you found?”
When we execute our query, it should return:
(QueueValue (NumberNode "12.5"))
You probably spotted our (NumberNode "12.5")
atom. It’s here because it was matched by the VariableNode
in the query, but what’s with the QueueValue
?
A QueueValue
is a list of atoms or other values.
cog-execute!
returns a QueueValue
instead of a “naked” node atom because a query may match more than one atom and there is no way to know the number of results that will be found, in the general case.
QueryLink to Utilize Query Results¶
QueryLink
is another way to execute a query. It is just like the MeetLink
atom that we used in the previous examples, except that QueryLink
allows us to specify what we want to do with the query results.
Last chapter, we used Scheme to add 50 to Fido’s weight. Now let’s do it with Atoms alone.
(cog-execute!
(QueryLink
(StateLink
fidos_weight_link
(VariableNode "fidos_weight_number_node")
)
(PlusLink
(VariableNode "fidos_weight_number_node")
(Number 50)
)
)
)
QueryLink
takes two arguments; the first is the query atom, in exactly the same format as MeetLink
, and the second atom is the operation to perform on each query result.
So, in our example, the first atom supplied to the QueryLink
matches the MeetLink
example above, and the second atom is a variant of the PlusLink
example.
Note
The query atom in the MeetLink
example named the VariableNode
as (Variable “$v1”)
while the QueryLink
example uses (VariableNode “fidos_weight_number_node”)
.
These are just different labels for the VariableNode
atom. There is a convention in some documentation to prepend variable names with the ‘$’ sigil, but I find the sigil unnecessary, and I prefer a descriptive name to the obtuse “v1”.
You can think of QueryLink
as performing two operations in sequence. First, it performs a query to search for matching atoms, and then it performs a subsequent atom execution to format each result.
You’ve probably noticed the VariableNode
appears in both the query atom and the result output format atom.
Personally, I think of this as the variable acquiring its meaning?? in the query (better word?? binding?? / grounding?? / I won’t say Value because that word is taken, but if this were another programming language then I’d say value.)
And thus the VariableNode
refers to a concrete atom when it is used in the output format atom.
Note
Much of the documentation and examples are written to feature GetLink
instead of MeetLink
, and BindLink
instead of QueryLink
.
The only semantic difference between these is that MeetLink
and QueryLink
return results as a QueueValue
which is transient,
while GetLink
and BindLink
return a SetLink
which will become part of the Atomspace until it is deleted.
To avoid cluttering up the Atomspace and the performance costs associated with that, the QueueValue
functions are better.
The OpenCog examples covering BindLink
and GetLink
apply equally well to QueryLink
and MeetLink
.
Lastly, let’s get our query result back into Scheme. Let’s use the Scheme snippet below to multiply the new value by 10.
(define fidos_weight_plus_50_query
(QueryLink
(StateLink
fidos_weight_link
(VariableNode "Fidos_weight_number_node")
)
(PlusLink
(VariableNode "Fidos_weight_number_node")
(Number 50)
)
)
)
(*
(cog-number
(car
(cog-value->list
(cog-execute! fidos_weight_plus_50_query)
)
)
)
10
)
Because cog-execute!
returns a QueueValue
to us, we must get the first element of the QueueValue
, which will be a NumberNode
. We can then extract the numerical value from that NumberNode
.
We use the cog-value->list
OpenCog function to convert the QueueValue
into a Scheme list, and then use Sheme’s car
to extract the first element of that list.
Finally, we can use the cog-number
OpenCog function to convert the NumberNode
into a Scheme number, before performing the arithmetic in Scheme.
Note
QUESTION for someone smarter than me. Why does (cog-value-ref) give me “index out of range” errors on QueueValues?? Conceptually, it seems like this should be something that should work. If not, what are the preferred semantics (most efficient) for dequeueing an element from a QueueValue?
That’s probably enough on this simple query. If you want a more complete explanation, the documentation for VariableNode
is here: https://wiki.opencog.org/w/VariableNode and the documentation for MeetLink
is here: https://wiki.opencog.org/w/MeetLink and QueryLink
is here: https://wiki.opencog.org/w/QueryLink
More Elaborate Queries with other Link Types¶
This is a good place to introduce the concepts of Grounded vs Ungrounded expressions. These terms come from formal logic, which you can read about on Wikipedia here: https://en.wikipedia.org/wiki/Ground_expression
The formal definition is that ungrounded expressions contain 1 or more Free VariableNode
atoms, while grounded expressions don’t contain any.
Personally, the way I think about it is that grounded expressions are statements and ungrounded expressions are questions.
Just as in English, questions and statements can take a similar grammatical form. Consider this example. Statement: “The man is running.” Question: “Who is running?” Answer: “The man”.
The question-word “Who” in this example is like a VariableNode
.
When the question is matched against the statement, the relative grammatical position of the word “Who” indicates which part of the statement will appropriately answer the question.
So, another intuition for MeetLink
and QueryLink
is that they take an ungrounded expression and produce a grounded expression.
Or said another way, it takes a question and returns an answer.
So let’s flip our previous question inside out. Consider this query:
(cog-execute!
(Meet
(State
(Variable "$v1")
(Number 12.5)
)
)
)
Our previous question was: “What is Fido the Dog’s weight in kg?”. Now our question is Jeopardy style: “Blank has a value is 12.5.”
Executing that snippet should return our ListLink
that represents Fido’s weight.
Note
Often we’ll want to compose compound questions. Sometimes a compound question has one unknown and more than one criteria, for example,
the English question: “What cities in Germany are on the river Danube?” is a compound question because it has two parts, “In Germany” and “On the river Danube”.
However, it is also possible to use multiple VariableNode
atoms within the query, and that’s the situation we’re about to cover.
Now, I want to ask the Atomspace to find the dogs that have a weight over 10kg. My query looks like this:
(cog-execute!
(QueryLink
(And
(State
(List
(Variable "dog_node")
(Predicate "weight_in_kg")
)
(Variable "dogs_weight_node")
)
(GreaterThan
(Variable "dogs_weight_node")
(Number 10)
)
)
(Variable "dog_node")
)
)
We found Fido!
Now, let’s go over the Links we just used, and I’ll explain the query along the way.
QueryLink to Format Query Results¶
Last time we encountered QueryLink
we used it as a way to execute additional operations on our query result.
Here we are using it to specify which portion of the query results we are interested in.
To understand this better, try this nearly identical version of the query using MeetLink
instead of QueryLink
.
(cog-execute!
(Meet
(And
(State
(List
(Variable "dog_node")
(Predicate "weight_in_kg")
)
(Variable "dogs_weight_node")
)
(GreaterThan
(Variable "dogs_weight_node")
(Number 10)
)
)
)
)
As you can see, it also returns (ConceptNode “Fido the Dog”)
. But unlike the QueryLink
version, the result is a bit more cluttered.
The MeetLink
version returns:
(QueueValue (ListLink
(ConceptNode "Fido the Dog")
(NumberNode "12.5")))
While the QueryLink
version returns just:
(QueueValue (ConceptNode "Fido the Dog"))
That is because we explicitly told the QueryLink
atom that we were interested in (Variable "dog_node")
as our result. On the other hand, the MeetLink
atom created a ListLink
referencing all of the VariableNode
atoms in our query.
AndLink for Multiple Query Criteria¶
AndLink
is a link atom type for performing the binary “And” operation. You probably guessed that from its name.
So, for a query to match, both sides of the AndLink
must be satisfied.
Back to our example:
(And
(State
(List
(Variable "dog_node")
(Predicate "weight_in_kg")
)
(Variable "dogs_weight_node")
)
(GreaterThan
(Variable "dogs_weight_node")
(Number 10)
)
)
This query’s use of And
is essentially saying “Find an atom connected to the weight_in_kg atom with a ListLink
that itself is connected to another atom by a StateLink
AND the numerical value of that other atom is greater than 10.”
Let’s try experimenting a bit with this query. For example, we’ll give Fido a friend by executing the Scheme snippet here:
(StateLink
(ListLink
(Concept "Fluffy the Dog")
(Predicate "weight_in_kg")
)
(NumberNode 9)
)
Now, if we change the query to compare against (Number 8)
instead of (Number 10)
, we’ll will find the query returns both Fido and Fluffy.
Moving on, notice that the (Variable "dogs_weight_node")
atom appears on both sides of the And
expression. This is important.
Echoing what I said above, Speaking as somebody with a strong background in procedural programming, the way I think about this is that the Variable
node is “defined” or temporarily given a concrete meaning by the first side of the And
expression, and then that concrete atom is used when evaluating the second side.
However, if your intuition comes from databases, you may want to think of the operation as an “INNER JOIN” from SQL. These mental models are functionally equivalent.
If you’re curious, the Atomspace has an OrLink
along with some other logical link types. However, if your intention is to perform an “OUTER JOIN”, you probably want to use ChoiceLink
instead of OrLink
.
“And” expressions narrow the Satisfying Set while “Or” expressions expand it. Therefore you may need to be careful using Variable
nodes on both sides of an “Or” expression and expecting them to be consistent. The behavior may not be what you intend.
There is certinly more that could be said on this topic, but it feels like a rat hole at this point in the guide.
GreaterThanLink to Filter by Numeric Value¶
As the name suggests, GreaterThanLink
compares two NumberNode
atoms using the “>” operator.
In the section above covering AndLink
, we already explaind how the (Variable "dogs_weight_node")
atom gets its meaning from the other side of the AndLink
expression.
So this comparison evaluates to true if the numeric value of the atom matched by the Variable node is greater than (Number 10)
. All pretty self-explanatory so far.
I’ll take this opportunity to introduce other link types along the same lines:
EqualLink Determines whether two atoms are actually the same atom, or whether they become the same atom when they are evaluated.
NotLink Is the logical “Not” operator. Evaluates to true if the atom it references evaluates to false and vice-versa. Things get a little more complicated when considering non-binary TruthValues, but that’s a topic we’ll cover later.
PlusLink Is the arithmetic operator for addition. It references two
NumberNode
atoms, and creates a third with the value of the sum of the other two.MinusLink Is the arithmetic operator for subtraction. It references two
NumberNode
atoms, and creates a third with the value of first minus the second.TimesLink Is the arithmetic operator for multiplication. It references two
NumberNode
atoms, and creates a third with the value of the product of the other two.DivideLink Is the arithmetic operator for division. I’m sure you’ve spotted the pattern by now.
You may have noticed that “LessThanLink” is absent. The less-than operator itself is just syntactic sugar because the argument order to GreaterThanLink
can implement a logically identical “LessThanLink”. Personally I’ve often wondered why more programming languages don’t conserve the less-than operator this way. Presumably the cost is tiny compared with improved code readability.
Note
Be careful Naked Values, e.g. FloatValue
, can’t be part of expressions in the Atomspace. For example, using EqualLink
to compare (FloatValue 2.0)
with (NumberNode 1.0)
will evaluate to true! This happens because the FloatValue
vanishes when constructing the expression, and thus the NumberNode
is compared to itself!
Note
QUESTION for someone smarter than me. How does one check for numerical equality? In other words, a link or other operator that can sucessfully compare a NumberNode with a numerical value. Also, I saw the note about the absence of (IntValue) etc., but equality for IEEE floats is problematic for many applications because values that are no longer representable with the mantissa bits become approximated leading to all kinds of unintended behavior.
ValueOfLink and SetValueLink¶
Coming full circle, let’s revisit values associated with atoms.
If you recall, last chapter, we used cog-set-value
to associate a value with an atom.
There is an Atomese equivalent: SetValueLink
.
SetValueLink
behaves exactly like cog-set-value
, except that it can be incorporated into another Atomese expressions.
This will become important when we begin discussing Atomese programs in the next chapter.
We might use cog-set-value
to associate an age with Fido, invoking the OpenCog command below:
(cog-set-value! (Concept "Fido the Dog")
(Predicate "age") (FloatValue 3))
The Atomese equivalent to that would be:
(cog-execute!
(SetValueLink (Concept "Fido the Dog")
(Predicate "age") (NumberNode 3)
)
)
These are almost the same, however, there is one important thing to remember: “Naked” Values can’t exist in the Atomspace.
Notice that (FloatValue 3)
is not exactly the same thing as (NumberNode 3)
.
The latter a reference to the NumberNode
atom standing in for the numeric value, while the former is the actual floating point number itself.
There is no way for the FloatValue
to be part of the Atomese exprssion we want to execute, because it can’t exist in the Atomspace at all.
Note
QUESTION FOR SOMEONE SMARTER THAN ME. Is this actually right??? This doesn’t feel right that there is no mechanism to make this work.
Conversely, we can access values associated with atoms in Atomese using ValueOfLink
.
ValueOfLink
is the Atomese equivalent to the cog-value
command we introduced last chapter.
So the Scheme snippet below:
(cog-execute!
(ValueOfLink
(Concept "Fido the Dog") (Predicate "age")
)
)
Is equivalent to directly accessing the value like this:
(cog-value (Concept "Fido the Dog") (Predicate "age"))
Thinking About Performance Querying Values¶
Last chapter, I told you that values can’t be used as search criteria. I should have said that values can’t be searched efficiently.
It turns out you actually can query the Atomspace for atoms with values that meet your query criteria. You just need to be careful. Consider the query below to find dogs based on age:
(cog-execute!
(Meet
(GreaterThan
(ValueOf (Variable "dog_node") (Predicate "age"))
(NumberNode 2)
)
)
)
It returns Fido, just like you probably expected. But not so fast! Literally.
Executing this query involves iterating over every single atom in the Atomspace, and checking to see if it has the (Predicate “age”)
key,
and if it does, then performing the comparison. It may have appeared to be quick enough, but that’s because you probably don’t have many atoms in your atomspace.
Consider what would happen if your atomspace contained millions of atoms!
You can still use ValueOf
links in queries, but be careful that they are only applied to sets of a tractable size, and not all of the atoms in the Atomspace.
One strategy for accelerating this query is to create a link that tracks whether a given node contains a key. Here is an example:
(cog-set-value! (Concept "Fido the Dog")
(Predicate "age") (FloatValue 3))
(Member
(Concept "Fido the Dog")
(Predicate "age")
)
The MemberLink
atom, in this case, is acting as a sentinel that says “Fido the Dog has an age key-value pair.”
Mathmatically, it is saying “Fido the Dog is a member of the age set”, where the “age” set is understood (by our convention) to contain all atoms that have an age value.
Now that we have a link we can query, we can compose a query using an AndLink
, like this:
(cog-execute!
(Meet
(And
(Member
(Variable "dog_node")
(Predicate "age")
)
(GreaterThan
(ValueOf (Variable "dog_node") (Predicate "age"))
(NumberNode 2)
)
)
)
)
This query will also find Fido and all other dogs older than 2, just like our first version. As you can see, the second branch of the query is identical to the one above. However, this query will have considerably better performance characteristics as the number of atoms in the Atomspace grows.
Next Chapter: Programming with Atomese