Python API for AllegroGraph

The Python API for the AllegroGraph system is a client-side API that implements convenient and efficient access to an AllegroGraph server from a Python-based application. This API consciously imitates the structure and functionality of the (Java-based) Sesame Repository API. It offers methods for querying and updating RDF data, and for managing the stored triples. Here we provide examples that illustrate how to program against the Python API. Developers who are already familiar with Sesame will find it especially easy to use the Python API.

Creating a Repository object

A 'repository' (also called a triple store, or an RDF store) is a managed set of RDF statements. Once an application has created a (client-side) Repository object, the remainer of the calls in the Python client interact with the repository through that object. A 'catalog' is a container for a set of repositories. When the AllegroGraph server is first started, it is pointed to one or more directories representing the locations of catalogs on the host machine. Starting up a Python application involves first creating a server object, then choosing a catalog known to the server, and then choosing a repository within the catalog.

The code below illustrates how this is done. The arguments to AllegroGraphServer specify the endpoint (host name and port) of an already-launched AllegroGraph server. The 'listCatalogs' call prints out the available catalogs. The 'openCatalog' call chooses one of them. A Repository object is created with a pointer to a catalog, and an indication of what kind of opening behavior is wanted ('renew', 'open', 'access', or 'create'). 'renew' clears the contents of a repository before opening; 'open' opens, and throws and exception if the repository is not found; 'access' attaches to an existing repository or creates a new one, and 'create' creates a new repository, and throws an exception if one by that name already exists. Once a repository object is created and initialized, it is ready to process commands.


from franz.openrdf.sail.allegrographserver import AllegroGraphServer
from franz.openrdf.repository.repository import Repository

server = AllegroGraphServer("localhost", port=8080)
print "Available catalogs", server.listCatalogs()
catalog = server.openCatalog('ag')          
print "Available repositories in catalog '%s':  %s" % (catalog.getName(), catalog.listRepositories())    
myRepository = Repository(catalog, "agraph_test4", Repository.RENEW)
myRepository.initialize()

Asserting and Retracting Triples

Resource and literal objects are created by an instance of 'ValueFactory', which itself is created by calling the method 'Repository.getValueFactory'. Below, we show how to create resources describing two people, named 'Bob' and 'Alice'. Assertions and retractions to the quad store are executed by 'add' and 'remove' methods belonging to a Connection class. A connection is created by calling the method 'Repository.getConnection()'. A number of classes and properties for the RDF, RDFS, XSD, and OWL ontologies are predefined. 'RDF.TYPE' is one such.

The 'add' and 'remove' methods take an optional 'contexts' argument that specifies one or more contexts that are the the target of triple assertions and retractions. When the context is omitted, triples are asserted/retracted to/from the null context. In the example below, facts about Alice and Bob reside in the null context.


f = myRepository.getValueFactory()
## create some resources and literals to make statements out of
alice = f.createURI("http://example.org/people/alice")
bob = f.createURI("http://example.org/people/bob")
name = f.createURI("http://example.org/ontology/name")
person = f.createURI("http://example.org/ontology/Person")
bobsName = f.createLiteral("Bob")
alicesName = f.createLiteral("Alice")

conn = myRepository.getConnection()
## alice is a person
conn.add(alice, RDF.TYPE, person)
## alice's name is "Alice"
conn.add(alice, name, alicesName)
## bob is a person
conn.add(bob, RDF.TYPE, person)
## bob's name is "Bob":
conn.add(bob, name, bobsName)
print "Triple count: ", conn.size()
conn.remove(bob, name, bobsName)
print "Triple count: ", conn.size()
conn.add(bob, name, bobsName)

A SPARQL Query

Our next example illustrates how to evaluate a SPARQL query, in this case, one that retrieves all triples in a store. The method 'Connection.prepareTupleQuery' creates a query object that can be evaluated one or more times. Currently, the only query language supported is SPARQL. The results of evaluating a query are returned in an iterator that yields a sequence of BindingSets. Below we illustrate one (rather heavyweight) method for extracting the values from a binding set, indexed by the name of the corresponding column variable in the SELECT clause.

The Connection class is designed to be created for the duration of a sequence of updates and queries, and then closed. In practice, many AllegroGraph applications keep a connection open indefinitely. However, best practice dictates that the connection is closed, as illustrated below. The same hygiene applies to the iterators that generate binding sets.


try:
    queryString = "SELECT ?s ?p ?o WHERE {?s ?p ?o .}"
    tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
    result = tupleQuery.evaluate();
    try:
        for bindingSet in result:
            s = bindingSet.getValue("s")
            p = bindingSet.getValue("p")
            o = bindingSet.getValue("o")          
            print "%s %s %s" % (s, p, o)
    finally:
        result.close();
finally:
    conn.close();

Statement Matching

The method 'Connection.getStatements' provides a streamlined alternative to SPARQL queries when the patterns to be matched are very simple. In a Sesame implementation where the repository and server are running in the same Java thread, a 'getStatements' call is much more efficient than a SPARQL query. However, AllegroGraph is currently accessible from Python only in a client/server configuration, and in that configuration, the processing times are comparable. Hence, 'getStatements' represents sugar-coating.

Below, we illustrate two kinds of 'getStatement' calls. The first mimics traditional Sesame syntax, and returns a Statement object at each iteration. Most of the time, this is a waste, since applications rarely make use of Statement objects. The second syntax borrows a trick from the JDBC API commonly used to access relational databases. A result set iterator does not materialize objects unless forced to. Here, it materializes only values of the object-position of the returned triples. The 'getValue' call forces materialization of a resource or literal, while the 'getString' call returns a string without creating an object. Developers who care about minimizing garbage will prefer to use the 'getJDBCStatements' call, and they will usually call 'getString' in preference to 'getValue'.


conn = myRepository.getConnection()
alice = myRepository.getValueFactory().createURI("http://example.org/people/alice")
statements = conn.getStatements(alice, None, None, False, [])
for s in statements:
    print s
    print "Same thing using JDBC:"
    resultSet = conn.getJDBCStatements(alice, None, None, False, [])
while resultSet.next():
    print "   ", resultSet.getValue(3), "   ", resultSet.getString(3)

The last argument to 'getStatements' takes a context or a list of contexts. If that list is instantiated, then only triples belonging to the enumerated contexts are retrieved. A value of 'None' denotes matching against the null context. Above, we are retrieving from all contexts.

The next example illustrates some variations on what we have seen so far. First, observe that 'ValueFactory.createURI' can be called with one or two arguments; when called with two, the namespace and local name are specified separately, and combined by the system. This is recommended, since it eliminates the need to replicate the same namespace over and over. Next, we show examples of various ways to declare typed literals, and language-specific literals.

In the Sesame API, the 'Connection.add' method is overloaded, enabling it to be called to add triples/quads, to add statement objects, and to load a file. We emulate that overloading in our Python implementation, but in fact static overloading does not usually mesh well with optional arguments, which are preferred in Python. Hence, we recommend calling 'addStatement' in preference to 'add' when adding a statement object, and calling 'addFile' in preference to 'add' when loading an RDF file into the quad store. Below, we show examples of both types of calls side-by-side.

The RDF/SPARQL spec is conservative to a fault when defining matching between various combinations of literal values. The match and query statements below illustrate how some of these combinations perform. Note that we illustrate an alternate syntax for pulling values out of a BindingSet object which takes advantage of the fact that our BindingSet can emulate a Python 'dict'.


conn.clear()
exns = "http://example.org/people/"
alice = f.createURI("http://example.org/people/alice")
age = f.createURI(namespace=exns, localname="age")
weight = f.createURI(namespace=exns, localname="weight")
favoriteColor = f.createURI(namespace=exns, localname="favoriteColor")
birthdate = f.createURI(namespace=exns, localname="birthdate")
ted = f.createURI(namespace=exns, localname="Ted")
red = f.createLiteral('Red')
rouge = f.createLiteral('Rouge', language="fr")
fortyTwo = f.createLiteral('42', datatype=XMLSchema.INT)
fortyTwoInteger = f.createLiteral('42', datatype=XMLSchema.LONG)
fortyTwoUntyped = f.createLiteral('42')
date = f.createLiteral('1984-12-06', datatype=XMLSchema.DATE) 
time = f.createLiteral('1984-12-06', datatype=XMLSchema.DATETIME) 
stmt1 = f.createStatement(alice, age, fortyTwo)
stmt2 = f.createStatement(ted, age, fortyTwoUntyped)
conn.add(stmt1)
conn.addStatement(stmt2)
conn.addTriple(alice, weight, f.createLiteral('20.5'))
conn.addTriple(ted, weight, f.createLiteral('20.5', datatype=XMLSchema.FLOAT))
conn.add(alice, favoriteColor, red)
conn.add(ted, favoriteColor, rouge)
conn.add(alice, birthdate, date)
conn.add(ted, birthdate, time)
for obj in [None, fortyTwo, fortyTwoUntyped, f.createLiteral('20.5', datatype=XMLSchema.FLOAT), f.createLiteral('20.5'),
            red, rouge]:
    print "Retrieve triples matching '%s'." % obj
    statements = conn.getStatements(None, None, obj, False, None)
    for s in statements:
        print s
for obj in ['42', '"42"', '20.5', '"20.5"', '"20.5"^^xsd:float', '"Rouge"@fr', '"1984-12-06"^^xsd:date']:
    print "Query triples matching '%s'." % obj
    queryString = """PREFIX xsd:  
    SELECT ?s ?p ?o WHERE {?s ?p ?o . filter (?o = %s)}
    """ % obj
    tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
    result = tupleQuery.evaluate();
        for bindingSet in result:
        s = bindingSet[0]
        p = bindingSet[1]
        o = bindingSet[2]
        print "%s %s %s" % (s, p, o)
fortyTwoInt = f.createLiteral(42)
print fortyTwoInt.toPython()

Importing and Exporting Triples

The Python API client can load either RDF/XML files or NTriples files into a quad store. The example below calls 'Connection.add' to load from an NTriples file and 'Connection.addFile' to load from an RDF/XML file (both methods work; the latter is recommended). NTriples and RDF/XML files can only store triples. In the case of the 'add' call, we have omitted the context argument, so by default, the triples are loaded into the null context. The 'addFile' call includes an explicit context setting, so the fourth argument of each vcard triple will be the context named "/tutorial/vc_db_1_rdf". The 'Connection.size' method takes an optional context argument. With no argument, it returns the total number of quads. Below, it returns the number '16' for the 'context' context argument, and the number '28' for the null context (None) argument.


conn.clear()   
path1 = "./vc-db-1.rdf"    
path2 = "./football.nt"            
baseURI = "http://example.org/example/local"
location = "/tutorial/vc_db_1_rdf" 
context = myRepository.getValueFactory().createURI(location)
conn.setNamespace("vcd", "http://www.w3.org/2001/vcard-rdf/3.0#");
## read football triples into the null context:
conn.add(path2, base=baseURI, format=RDFFormat.NTRIPLES)
## read vcards triples into the context 'context':
conn.addFile(path1, baseURI, format=RDFFormat.RDFXML, context=context);
myRepository.indexTriples(all=True, asynchronous=False)
print "After loading, repository contains %s vcard triples and %s football triples." % (conn.size(context), conn.size(None))

Whenever a significant number of updates is made to the RDF store, the method 'Connection.indexTriples' should be called. In the above example, it is called after both files have been loaded. The argument "all=True" tells it to (re)index all triples in the store. The default behavior is to only index triples updates since the last call to 'indexTriples'. In that case, indexing is quicker, but the data structures are not quite as well-organizezd. Setting 'asynchronous=True' tells the server to fork a thread to index the triples. The return from the call will be immediate, and indexing will conclude at some future time.

The SPARQL query below binds the variable '?c' to the context associated with each triple. It prints out the each URI that appears in subject position, together with the name of the corresponding context. The null context returns 'None'.


conn = test6().getConnection()
queryString = "SELECT DISTINCT ?s ?c WHERE {graph ?c {?s ?p ?o .} }"
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
result = tupleQuery.evaluate();
for bindingSet in result:
    print bindingSet[0], bindingSet[1]

The next examples show how to write triples out to a file. To write triples in NTriples format, call 'NTriplesWriter'. To write triples in RDF/XML format, call 'RDFXMLWriter'. If the output file argument is 'None', the writers write to standard output (try uncommenting to see that work). The method 'Connection.export' writes out all triples in one or more contexts. This provides a convenient means for making local backups of sections of your RDF store. If two or more contexts are specified, then triples from all of those contexts will be written to the same file. Since the triple are "mixed together" in the file, the context information is not recoverable. If the 'context' argument is omitted, all triples in the store are written out.


outputFile = "/tmp/temp.nt"
#outputFile = None
if outputFile == None:
    print "Writing to Standard Out instead of to a file"
ntriplesWriter = NTriplesWriter(outputFile)
conn.export(ntriplesWriter, context);
outputFile2 = "/tmp/temp.rdf"
#outputFile2 = None
if outputFile2 == None:
    print "Writing to Standard Out instead of to a file"
rdfxmlfWriter = RDFXMLWriter(outputFile2)    
conn.export(rdfxmlfWriter, context)

Finally, if the objective is to write out a filtered set of triples, 'Connection.exportStatements' can be called. The example below writes out all type declaration triples to standard output.


conn.exportStatements(None, RDF.TYPE, None, False, RDFXMLWriter(None))

Contexts

We have already seen contexts at work when loading and saving files. Here we fill in a few more blanks. Up front in the next example, we create six statements, and add two of each to three different contexts. A match over all contexts returns all six statements. The next match explicitly lists 'context1' and 'context2' as the only contexts to participate in the match; it returns four statements. Next, we switch to SPARQL queries. Named contexts may be included in the FROM and FROM-NAMED clauses in a SPARQL query. Below, we illustrate the procedural equivalent, which is to create a 'dataset' object, add the contexts to that, and then to attach the dataset to the query object. The first query is (again) restricted to only those statements in contexts 1 and 2. Currently, its not possible to combine the null context with other contexts in a SPARQL query. Below, we illustrate how to evaluate a query against only the null context.


conn.clear()
exns = "http://example.org/people/"
alice = f.createURI(namespace=exns, localname="alice")
bob = f.createURI(namespace=exns, localname="bob")
ted = f.createURI(namespace=exns, localname="ted")
person = f.createURI(namespace=exns, localname="Person")
name = f.createURI(namespace=exns, localname="name")    
alicesName = f.createLiteral("Alice")    
bobsName = f.createLiteral("Bob")
tedsName = f.createLiteral("Ted")    
context1 = f.createURI(namespace=exns, localname="cxt1")      
context2 = f.createURI(namespace=exns, localname="cxt2")          
conn.add(alice, RDF.TYPE, person, context1)
conn.add(alice, name, alicesName, context1)
conn.add(bob, RDF.TYPE, person, context2)
conn.add(bob, name, bobsName, context2)
conn.add(ted, RDF.TYPE, person)
conn.add(ted, name, bobsName)
statements = conn.getStatements(None, None, None, False)
print "All triples in all contexts:"
for s in statements:
    print s
statements = conn.getStatements(None, None, None, False, [context1, context2])
print "Triples in contexts 1 and 2:"
for s in statements:
    print s
queryString = """
SELECT ?s ?p ?o ?c
WHERE { GRAPH ?c {?s ?p ?o . } } 
"""
ds = Dataset()
ds.addNamedGraph(context1)
ds.addNamedGraph(context2)
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
tupleQuery.setDataset(ds)
result = tupleQuery.evaluate();    
print "Query over contexts 1 and 2."
for bindingSet in result:
    print bindingSet.getRow()
queryString = """
SELECT ?s ?p ?o    
WHERE {?s ?p ?o . } 
"""
ds = Dataset()
ds.addDefaultGraph(None)
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
tupleQuery.setDataset(ds)
result = tupleQuery.evaluate();    
print "Query over the null context."
for bindingSet in result:
    print bindingSet.getRow()

Namespaces

A namespace is that portion of a URI that preceeds the last '#', '/', or ':' character, inclusive. The remainder of a URI is called the localname. For example, with respect to the URI "http://example.org/people/alice", the namespace is "http://example.org/people/" and the localname is "alice". When writing SPARQL queries, it is convenient to define prefixes or nicknames for the namespaces, so that abbreviated URIs can be specified. For example, if we define "ex" to be a nickname for "http://example.org/people/", then the string "ex:alice" is a recognized abbreviation for "http://example.org/people/alice". This abbreviation is called a qname.

In the SPARQL query in the example below, we see two qnames, "rdf:type" and "ex:alice". Ordinarily, we would expect to see "PREFIX" declarations in SPARQL that define namespaces for the "rdf" and "ex" nicknames. However, the Connection and Query machinery can do that job for you. The mapping of prefixes to namespaces includes the built-in prefixes RDF, RDFS, XSD, and OWL. Hence, we can write "rdf:type" in a SPARQL query, and the system already knows its meaning. In the case of the 'ex' prefix, we need to instruct it. The method 'Connection.setNamespace' registers a new namespace. In the example below, we first register the 'ex' prefix, and then submit the SPARQL query. It is legal, although not recommended, to redefine the built-in prefixes RDF, etc..


conn.clear()
exns = "http://example.org/people/"
alice = f.createURI(namespace=exns, localname="alice")
person = f.createURI(namespace=exns, localname="Person")
conn.add(alice, RDF.TYPE, person)
myRepository.indexTriples(all=True, asynchronous=True)
conn.setNamespace('ex', exns)
queryString = """
SELECT ?s ?p ?o 
WHERE { ?s ?p ?o . FILTER ((?p = rdf:type) && (?o = ex:Person) ) }
"""
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
result = tupleQuery.evaluate();    
for bindingSet in result:
    print bindingSet[0], bindingSet[1], bindingSet[2]

It is worthwhile to briefly discuss performance here. In the current AllegroGraph system, queries run more efficiently if constants appear inside of the "where" portion of a query, rather than in the "filter" portion. For example, the SPARQL query below will evaluate more efficiently than the one in the above example. However, in this case, you have lost the ability to output the constants "http://www.w3.org/1999/02/22-rdf-syntax-ns#type" and "http://example.org/people/alice". Occasionally you may find it useful to output constants in the output of a 'select' clause; in general though, the above code snippet illustrates a query syntax that is discouraged.


SELECT ?s  
WHERE { ?s rdf:type ex:person }

Free Text Search

It is common for users to wish to build RDF applications that combine some form of "keyword search" with their queries. For example, a user might want to retrieve all triples for which the word "Alice" appears as a word within the third (object) argument to the triple. AllegroGraph provides a capability for including free text matching within a SPARQL query. It requires, however, that you register the predicates of triples that should participate in the match. In the example below, we have called the method 'Repository.registerFreeTextPredicate' to the register the predicate "http://example.org/people/fullname". When we execute our SPARQL query, it matches the "Alice" within the literal "Alice B. Toklas" because that literal occurs in a triple having the 'fullname' predicate, but it does not match the "Alice" in the literal "Alice in Wonderland" because the predicate for that triple is the 'booktitle' predicate. Furthermore, observe that the 'fti:match' predicate takes as its first argument the subject of the matching triple, not the object. Finally, notice that we did not include a prefix declaration for the 'fti' nickname. That is because 'fti' included among the built-in namespace/nickname mappings.


conn.clear()
exns = "http://example.org/people/"
conn.setNamespace('ex', exns)
myRepository.registerFreeTextPredicate(namespace=exns, localname='fullname')
alice = f.createURI(namespace=exns, localname="alice1")
persontype = f.createURI(namespace=exns, localname="Person")
fullname = f.createURI(namespace=exns, localname="fullname")    
alicename = f.createLiteral('Alice B. Toklas')
book =  f.createURI(namespace=exns, localname="book1")
booktype = f.createURI(namespace=exns, localname="Book")
booktitle = f.createURI(namespace=exns, localname="title")    
wonderland = f.createLiteral('Alice in Wonderland')
conn.add(alice, RDF.TYPE, persontype)
conn.add(alice, fullname, alicename)
conn.add(book, RDF.TYPE, booktype)    
conn.add(book, booktitle, wonderland) 
conn.setNamespace('ex', exns)
#conn.setNamespace('fti', "http://franz.com/ns/allegrograph/2.2/textindex/")    
queryString = """
SELECT ?s ?p ?o
WHERE { ?s ?p ?o . ?s fti:match 'Alice' . }
"""
tupleQuery = conn.prepareTupleQuery(QueryLanguage.SPARQL, queryString)
result = tupleQuery.evaluate(); 
print "Query results"
for bindingSet in result:
    print bindingSet