Installation
Windows
There are 32 and 64-bit versions of AllegroGraph available for Windows. Select the appropriate one for the version of Windows you are running. The 32-bit version of AllegroGraph will run on the 64-bit version of Windows, but the 64-bit version of AllegroGraph will not run on the 32-bit version of Windows.
After downloading the installer executable, just run it. It will guide you through the installation. After AllegroGraph is installed, you can run it by selecting the "Start AllegroGraph Free Java Edition Server" item on the Start | Programs | AllegroGraph Free Java Edition menu. This will start the server with default ports, and wait for connections from your Java client.
After installation, we recommend you read the "Documentation for AllegroGraph Free Java Edition", also on the Start menu in the same place as the above link.
Linux
There are two Linux versions of AllegroGraph: 32-bit (x86) and 64-bit (x86-64). Select the appropriate one for the version of Linux you are running. The 32-bit version of AllegroGraph may run on 64-bit Linux, but the the 64-bit version of AllegroGraph will not run on the 32-bit version of Linux.
On Linux, AllegroGraph is distributed as an RPM. After installation, the files are located in the /usr/lib/agraph/ directory. The server executable is /usr/lib/agraph/AllegroGraphJavaServer.
You will likely need root permissions to install the RPM package. If you do not, then you can install into an alternate directory by doing the following:
First, make a directory for your private RPM database and initialize it:
% mkdir $HOME/rpmdb
% rpm --dbpath $HOME/rpmdb --initdb
Then, make a directory for the installation of AllegroGraph and install it into this directory:
% mkdir $HOME/agraph
% rpm -ivh --nodeps --prefix $HOME/agraph --dbpath $HOME/rpmdb agraph-2.2-1.x86_64.rpm
After the final command above is finished, you will have a directory $HOME/agraph/agraph/ containing AllegroGraph.
After installation, we recommend you read the Java tutorial below.
Mac OS X
There is one Mac OS X version of AllegroGraph 2.x:
- 32-bit x86
A 64-bit Intel port is underway. If you require PPC support, please contact Franz for the most uptodate information.
On Mac OS X, AllegroGraph is distributed as a DMG file. After installation, the files are located in the /Applications/AllegroGraph directory. The server is called AllegroGraphJavaServer.
After installation, we recommend you read the Java tutorial below.
Solaris
There are two Solaris versions of AllegroGraph: 64-bit SPARC and 64-bit AMD64. Select the appropriate one for the version of Solaris you are running.
On Solaris, AllegroGraph is distributed as a compressed tar archive. You may extract the files anywhere you wish. The tar archive contains a directory name agraph-version, where version is the version of AllegroGraph you downloaded. The server executable is called AllegroGraphJavaServer in this directory.
After installation, we recommend you read the Java tutorial in below.
FreeBSD
On FreeBSD, AllegroGraph is distributed as a compressed tar archive. You may extract the files anywhere you wish. The tar archive contains a directory name agraph-version, where version is the version of AllegroGraph you downloaded. The server executable is called AllegroGraphJavaServer in this directory.
After installation, we recommend you read the Java tutorial in below.
Updater
From time to time we release updates or fixes to the Java Edition. When we do, you can find descriptions of them on this page: http://agraph.franz.com/support/patches/log/.
In the AllegroGraph installation directory there is an application called updater when you can run to download the latest patches. When you do, the output of the program will look something like this:
peep% ./updater
;; Connecting to http://www.franz.com/ftp/pub/patches/.
;; Reading CRC cache...done.
;; Checking for new update.fasl.
;; Retrieving list of available patches.
;; Checking which patches need to be downloaded.
downloading: update/agraph/2.2/DESCRIPTIONS
downloading: update/agraph/2.2/pfo013.001 (compressed)
;; Creating CRC cache (please wait)...done.
***** NOTE: You must restart the AllegroGraph server for the newly
downloaded updates to be installed.
peep%
In the above case, the line of interest is the one that references update/agraph/2.2/pfo013.001.
Remember to restart your server after running the updater. Updates are loaded at server start time only.
Introduction
This document introduces AllegroGraph. It assumes that you are somewhat familiar with RDF (Resource Description Framework), RDFS (RDF Schema), and OWL (Web Ontology Language). If you are not very familiar with RDF, RDFS, and OWL, we suggest that you start with A Semantic Web Primer by Grigoris Antoniou and Frank van Harmelen (2001, Cambridge MA, MIT press; available, e.g. from www.amazon.com). It is a very gentle introduction to these new technologies. For a quick introduction, see these Wikipedia entries: OWL, RDF, and RDFS.
The big picture
AllegroGraph is a pure triple store that you can use for storing RDFS/OWL triples but also as an on-disk graph database.
Triples: For conventional reasons we call AllegroGraph a triple-store but actually it stores quints. A triple is a structure with 5 slots: The first three are the usual subject (s), predicate (p), and object (o); In addition a triple has a named-graph slot (g) and a unique, AllegroGraph assigned, id (i). If you are not familiar with named-graphs or their usage then please see http://www.w3.org/2004/03/trix/ for more information. You may also want to look at the paper "Named Graphs, Provenance and Trust" by Carroll et. al. at http://www2005.org/cdrom/docs/p613.pdf (PDF) where they were introduced.
Loading: There are several ways to load data into the triple store. Currently we support NTriple format, RDF/XML format and you can programmatically insert triples.
Dictionary: Resources, blank-nodes and literals are stored in a dictionary and accessed by a hash we call the Unique Part Identifier (UPI).
Indices: AllegroGraph is indexed in such a way that any combination of s, p, o, and g can always be found with one disk access. We provide a cursor on the index to optimize memory usage.
First-class Triples for reification : AllegroGraph has unique ids and we allow triples to point to other triples. This makes reification (making statements about a triple) very efficient, i.e. less space and time is consumed than with the original RDF model of reification (see the RDF Semantics document for all the details).
and More AllegroGraph includes an RDFS++ reasoner, freetext indexing, full SPARQL support, Prolog integration and more!
Accessing AllegroGraph from Java
The Java API to the AllegroGraph triple store allows Java applications to access and manipulate triple store databases.
This tutorial introduces some of the Java Allegrograph API objects and methods in simple examples. The full documentation of the Java API is here.
Preparing the Triple Store
The Java API to the AllegroGraph Triple Store is a client-server implementation where the Java application is the client. In the Java-only edition of AllegroGraph, there are two distinct modes of operation possible:
- The Java application starts an AllegroGraph server when it needs one and discards it when done.
- The AllegroGraph server is started as a separate application, and the Java application connects when necessary.
Starting the AllegroGraph server from a Java application
In this mode of operation the Java application calls the startServer() method in the AllegroGraphConnection class. The only preparation needed for this mode of operation is to know where the AllegroGraph server executable was installed.
The Java application can specify the location of the server executable explicitly with a call to setDefaultCommand() or setCommand().
The Java application may also be started with a property setting for the property com.franz.ag.exec.
The most convenient mode is to set a user or system Java Preferences value with the utility in the main() method of the AllegroGraphConnection class. See the section Setting the location of the AllegroGraph Server application for full details. A Preferences setting persists from one session to the next and needs to be set only once in an installation.
Starting the AllegroGraph server as a separate application
The AllegroGraph server application is started from its installation location. The startup parameters specify the port numbers. The Java application must use these same parameters to connect to the server.
The section The AllegroGraph server application describes the AllegroGraph server application in detail.
Testing the interface
We include a sample program, AGExample.java, in the AllegroGraph distribution. This program may be used to verify the installation and to demonstrate that the connection between Java and AllegroGraph is working. Furthermore, the source code provides examples of using Java AllegroGraph. Please take a look at AGExample.java.
Before you run the client Java program, it must be informed about the location of one important file: com.franz.agraph-2-2-5.jar resides in the AllegroGraph installation directory.
The full pathname to this file can be included in the Java classpath, or the files may be copied to a more convenient location. When using Eclipse, it may be specified as a library in the project properties.
Testing the interface on Windows
The first step is to start the AllegroGraph server by selecting the AllegroGraph server item on the AllegroGraph Start Menu entry (or double-click on AllegroGraphJavaServer in the AllegroGraph installation directory).
The second step is to open a command window in the folder where AllegroGraph was installed.
At this point, the following command will start the sample application, but it will terminate immediately with an error message because the program needs the location of the database work area:
java -cp .;com.franz.agraph-2-2-5.jar AGExample
The full command line parameters of the sample program are described in a comment in the program source. The most important argument is "-d", a required argument which specifies an existing directory to hold the database files:
java -cp .;com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst
Other command examples:
Load the the Wilbur example OWL ontology:
java -cp .;com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst -r wilburwine.rdf
The above command assumes you are in the AllegroGraph installation directory, as wilburwine.rdf is distributed with AllegroGraph.
Load the ntriples version of Wilbur OWL ontology:
java -cp .;com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst -t wilburwine.ntriples
NOTE: when a large data file is specified, there may be a delay before the sample program shows any output.
Testing the interface on Linux and Unix
The first step is to open a shell in the AllegroGraph installation directory.
The second step is to start the AllegroGraphJavaServer executable. You may want to put it into the background and redirect the output from the program to a file.
At this point, the following command will start the sample application, but it will terminate immediately with an error message because the program needs the location of the database work area:
java -cp '.:com.franz.agraph-2-2-5.jar' AGExample
The full command line parameters of the sample program are described in a comment in the program source. The most important argument is "-d", a required argument which specifies :
java -cp .:com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst
Other command examples:
Load the the Wilbur example OWL ontology:
java -cp .:com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst -r wilburwine.rdf
The above command assumes you are in the AllegroGraph installation directory, as wilburwine.rdf is distributed with AllegroGraph.
Load the ntriples version of Wilbur OWL ontology:
java -cp .:com.franz.agraph-2-2-5.jar AGExample -d /tmp/ag/ -n tst -t wilburwine.ntriples
NOTE: when a large data file is specified, there may be a delay before the sample program shows any output.
More advanced uses of the sample application
The sample application tests several other command-line arguments that modify the behavior of the application. These arguments are described in comments in the source code.
The application can also start the server if the "-x" argument is added to the command.
Stopping the AllegroGraph Server Application
Once the AllegroGraph server application is running, it can be terminated in several ways:
- a Java application may call one of the stopServer() methods,
- the server lease may expire, or
- an operating system kill signal from an operator console or window.
We supply a small Java application that stops the AllegroGraph server. The application is run with a command such as the following.
On Windows:
java -cp .;com.franz.agraph-2-2-5.jar AGStop [-p port] [-h host]
On Unix:
java -cp '.:com.franz.agraph-2-2-5.jar' AGStop [-p port] [-h host]
Tutorial
Connecting Java to the Triple Store
The first thing you might have noticed reading through the test program, AGExample.java, is that each Java application must connect to the server before any part of the API can be used. Connect to the server by creating a new instance of the class AllegroGraphConnection. The AllegroGraphConnection class implements methods open(), create(), and others that open databases and return instances of the class AllegroGraph. Each open database is represented by a new instance of the class AllegroGraph.
If the Java application disconnects from the server, all AllegroGraph instances become invalid and must be discarded.
Buffered Operations
The communication between the Java application and the AllegroGraph server takes place through a socket. In order to minimize the delays that may be imposed by operating system overheads, it is good practice to operate on many data items in each interaction between the Java client application and the server.
We facilitate this buffering by providing array operations for most of the database accessors. The array operations create or retrieve many database elements in a single interaction and therefore are much more time efficient.
Simple Database Operations
Opening a database
A database is opened by creating an AllegroGraph instance.
AllegroGraphConnection sv = new AllegroGraphConnection();
sv.enable();
AllegroGraph ts = sv.create("test, "/s/ja/temp/");
The database is closed with the closeDatabase() method. Once the database is closed, the AllegroGraph instance should be discarded since it cannot be used for further interactions.
To re-open a database, create a new AllegroGraph instance.
Creating triples
Triples can be created one at a time by naming the components with strings in ntriples syntax.
ts.addStatement("<http://www.franz.com/things#Dog>",
"<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>",
"<http://www.w3.org/2002/07/owl#Class>");
The application can also save the details of the newly created triple by creating a new Triple instance with the newTriple() method.
Triple tr2 = ts.newTriple(
"<http://www.franz.com/things#Dog>",
"<http://www.w3.org/2000/01/rdf-schema#subClassOf>",
"<http://www.franz.com/things#Mammal>");
When many triples are created, it is more efficient to buffer the operation by grouping the triple components into arrays. The following statement creates three triples from corresponding elements of the arrays.
ts.addStatements(
new String[]{
"<http://www.franz.com/things#Cat>",
"<http://www.franz.com/things#Giraffe>",
"<http://www.franz.com/things#Lion>" },
new String[]{
"<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>",
"<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>",
"<http://www.w3.org/1999/02/22-rdf-syntax-ns#type>" },
new String[]{
"<http://www.w3.org/2002/07/owl#Class>",
"<http://www.w3.org/2002/07/owl#Class>",
"<http://www.w3.org/2002/07/owl#Class>" }
);
When an array consists of identical elements, it can be shortened to a single element. The following statement creates three triples where the predicate and object components are identical.
ts.addStatements(
new String[]{
"<http://www.franz.com/things#Cat>",
"<http://www.franz.com/things#Giraffe>",
"<http://www.franz.com/things#Lion>" },
new String[]{"<http://www.w3.org/2000/01/rdf-schema#subClassOf>"},
new String[]{"<http://www.franz.com/things#Mammal>"}
);
Querying for triples
Triples are retrieved from the database with a Cursor instance. The Cursor instance can iterate through all the triples in the search result. The following statement will retrieve the four triples about subclasses of the "Mammal" class created earlier.
String wild = null;
Cursor cc = ts.getStatements(
wild,
"<http://www.w3.org/2000/01/rdf-schema#subClassOf>",
"<http://www.franz.com/things#Mammal>" );
When a Cursor instance is created, it is not positioned at a result. The step() method advances the Cursor instance to the first or next result. When a Cursor has been advanced, the returned value is true. When a Cursor is exhausted, the returned value is false.
if ( cc.step() ) Triple tr = cc.getTriple();
When the Cursor is positioned at a result, we can retrieve the component of interest without creating a Triple instance.
Value s = cc.getSubject();
We can also retrieve several results in one operation. The following statement retrieves an array of at most 6 elements:
Triple[] trc = cc.step(6);
int n = trc.length;
Optimization notes
Maximum Index Chunk Size parameter
This parameter, settable and gettable by the setChunkSize() and getChunkSize() methods, controls the maximum number of records that are sorted at a time during index merging. (Indexing happens by calling indexAll() or indexTriples() methods.)
The initial value of this parameter is believed to be good for machines with 1-2GB of RAM. If your computer has significantly more memory than this, you might improve indexing performance by using larger values (e.g., doubling or more the initial value).
Expected Unique Resources parameter
This parameter, settable and gettable by the setDefaultExpectedResources() and getDefaultExpectedResources() methods, controls the default value for the expected number of unique resources in a new triple store. This number is the expected number of distinct URIs and literals in the triple store database. If the number is too small, performance may suffer during database creation. A rough rule of thumb is to specify a number that is one third of the number of triples.
The OpenRDF Model
We implement most of the interfaces in the OpenRDF model defined at http://openrdf.org/.
The current implementation does not implement the interface Graph.
More complex queries using Prolog
AllegroGraph includes a Prolog implementation that may be used to search a database. The select() and selectValues() methods allow searches that return triples or database nodes and literals.
ValueObject[][] v =
ts.selectValues
("(?x ?y ?z) " +
" (and (q ?x " +
" !http://www.w3.org/1999/02/22-rdf-syntax-ns#type " +
" ?y) " +
" (q ?y " +
" !http://www.w3.org/2000/01/rdf-schema#subClassOf " +
" ?z))",
new Object[0], "");
The result v will be an array of sub-arrays. Each sub-array represents one successful match of the query. Each sub-array will be of length 3: the first element in the sub-array will be the binding of the variable ?x, the second ?y and the third ?z.
It may also be desirable to substitute values from the Java application into the query string. This can be done by simply concatenating the required strings, but we do allow a more convenient option.
URI typePred = ts.addURI("http://www.w3.org/1999/02/22-rdf-syntax-ns#type");
URI classPred = ts.addURI("http://www.w3.org/2000/01/rdf-schema#subClassOf>");
ValueObject[][] w = ts.selectValues
("(?x ?y ?z) (and (q ?x ?a ?y) (q ?y ?b ?z))"
new Object[]{ typePred, classPred },
"?a ?b");
This query returns the same result as the previous example, but we have substituted values from the program into the query.
A query can return a mixture of nodes, literals and triples. The query
URI typePred = ts.addURI("http://www.w3.org/1999/02/22-rdf-syntax-ns#type");
URI classPred = ts.addURI("http://www.w3.org/2000/01/rdf-schema#subClassOf>");
ValueObject[][] w = ts.selectValues
("(?x ?y ?z ?t ?u) (and (q ?x ?a ?y ? ?t) (q ?y ?b ?z ? ?u))"
new Object[]{ typePred, classPred },
"?a ?b");
returns an array where each sub-array is of length 5. The fourth and fifth elements in the sub-array are the triples that satisfied the query. The lone question marks in the pattern skip the graph position of each triple to allow unification with the triple ids.
If all the results of interest are triples, a select() method can be used to return a Cursor instance. The Cursor instance is an iterator that returns the triples in order.
Cursor tv = ts.select
("(?t ?u) (and (q ?x " +
" !http://www.w3.org/1999/02/22-rdf-syntax-ns#type " +
" ?y ? ?t) " +
" (q ?y " +
" !http://www.w3.org/2000/01/rdf-schema#subClassOf " +
" ?z ? ?u))",
new Object[0], "");
The cursor in variable tv will return triples t1, u1, t2, u2,... where t1 is the triple matching ?t in the first match of the query, and u1 is the triple matching ?u in the first match of the query.
If query variables not bound to triples are included in the query variables, they are ignored. Thus the query
Cursor tw = ts.select
("(?t ?x ?u) (and (q ?x " +
" !http://www.w3.org/1999/02/22-rdf-syntax-ns#type " +
" ?y ? ?t) " +
" (q ?y " +
" !http://www.w3.org/2000/01/rdf-schema#subClassOf " +
" ?z ? ?u))",
new Object[0], "");
returns exactly the same value as the previous query. Additional select() methods are provided to allow data to be substituted into the query.
More complex queries using SPARQL
AllegroGraph includes a SPARQL implementation that may be used to search a database. The methods twinqlAsk(), twinqlSelect, twinqlFind, and twinqlQuery allow searches that return a true/false result, an array of objects, a Cursor instance or a result serialized into an XML string.
For notes on twinql's conformance to the W3C specification please see this document.
How to use text indexing from Java
If you want to know how this all works it is worthwhile to look at the tutorial after this section. The Javadocs also describe all the main methods.
The main methods:
public Cursor getFreetextStatements(String pattern)
will return a cursor of all the triples that match pattern.
The input pattern for getFreetextStatements is described in the JavaDocs but here is a summary of the syntax for the input patterns.
_pattern_ -> _string-pattern_ | _composite-pattern_
_string-pattern_ -> _string_ | _phrase-string_
_string_ -> _char_"
_char_ -> *?* -- denotes a wild card that matches any single character
_char_ -> *\** -- denotes a wild card that matches any sequence of characters
_char_ -> _any_ -- most other characters denote themselves
_phrase-string_ -> `'this is a phrase'` no wild cards allowed
_composite-pattern_ -> (and _pattern_\*) | (or _pattern_\*)
public ValueObject[] getFreetextUniqueSubjects(String pattern)
will return a ValueObject that contains all the unique triple-subjects that match pattern.
public String[] getFreetextPredicates()
returns a string array of the predicates that you registered for freetext indexing.
public void registerFreetextPredicate(Object predicate)
register a predicate for indexing. Freetext indexing predicates must be registered before any triples are added to the triple store. We will relax this constraints in future versions.
Reference
The AllegroGraph server application
The AllegroGraph server application is started with the command
AllegroGraphJavaServer -port port -port2 port2
-limit limit -users users -lease lease
-log logfile -init initfile
-exres expected-resources -index index-chunk-size
-quiet -verbose -debug -standalone -stop
-nojava
-http http-port -hinit http-init-file
where all the arguments are optional. The arguments may be supplied to modify behavior as follows:
- port must be an available port number (the default is 4567).
- port2 must be an available port number (the default is 4568). This parameter is not used.
- limit is the total number of connections allowed over time. The default is -1 to specify an unlimited number.
- users is the maximum number of simultaneous connections allowed in this server instance. The default is 3.
- lease is the number of seconds the server will run without any interactions. When the lease expires the server exits. The lease is renewed whenever a call arrives from Java. A negative number specifies an indefinite lease; this is the built-in default.
- logfile is the path for a file where the server will log progress and error messages. The path is relative to the directory where the server is started.
- initfile is the path for a file that is loaded (and evaluated) when the server is started. The default is the logical pathname
"sys:agraph;ag010200.cl". - expected-resources specifies the number of expected unique literal values in a new database. If this value is too small, some internal tables will need to grow repeatedly as the database is built. These growth steps will delay creation and may trigger a memory overflow. When creating very large databases (millions of triples) a good estimate for this parameter will improve performance. A value of -1 denotes the built-in default (currently 100000). A value from 0 to 63 denotes a power of 2. Any other positive value is used verbatim.
- index-chunk-size specifies how a databse is broken down into segments during indexing. If this number is too large, the server may page excesively during indexing, or run out of memory. If this number is too small, too many file handles may be needed during indexing. When creating very large databases (millions of triples) a good estimate for this parameter will improve performance. A value of -1 denotes the built-in default. A value from 0 to 63 denotes a power of 2. Any other positive value is used verbatim.
-quiet,-verboseand-debug-- these flags control the amount of progress information printed by the server. In the case of-debug, on UNIX platforms it also indicates the server should run in the foreground (i.e., not run as a daemon).-standalone,-stop-- these two flags are mutually exclusive. If-standaloneis specified, the server returns all error signals to Java as Java exceptions (this is the default behavior). If-stopis specified, the server will stop for some serious conditions (such as out-of memory) and allow some diagnostic operations.-nojava-- this flag suppresses the starting of the Java server.- http-port when present specifies that the Sesame HTTP server should be started at the specified port number.
- http-init-file specifies a file that is loaded after the Sesame HTTP server is started.
On UNIX platforms the default behavior (without the -debug command line argument) is to run in the background, as a daemon. In this case, it is strongly recommended that you use the -log command line argument to specify a log file for the various messages printed by the server.
On Windows, the server runs in a window that is normally minimized.
Setting the location of the AllegroGraph Server application
The main() method of the AllegroGraphConnection class is a utility that sets the Java Preferences value used by the subsequent application.
java -cp '.:com.franz.agraph-2-2-5.jar' com.franz.ag.AllegroGraphConnection [-user uuu] [system sss]
If the method is run without any arguments, it simply lists the current settings on the console.
The -user argument sets a user preference; the -system argument sets a system preference.
Setting a system preference normally requires administrator permission.
The value of each argument is the absolute pathname of the AllegroGraphJavaServer executable distributed with AllegroGraph.
We have not tested Preferences settings with all possible Java and OS combinations. On Windows XP, both user and system preferences are set reliably with Java 1.4.2 and Java 5. On Linux (Fedora 5), Java 5 sets user preferences but GNU Java 1.4.2 did not.
AllegroGraph Java sources
The Java code for the AllegroGraph Java API is open source under the terms of the Mozilla Public License Version 1.1. The source code is distributed with AllegroGraph and is installed with the other AllegroGraph files. The main source files are in agsrc-2-2-5.jar. The file agsrctbc-2-2-5.jar contains additional classes that are used by TopBraidComposer.