Table of Contents

Introduction

Text-Indexing Features

Indexing words in Japanese

Introduction

AllegroGraph supports multiple free-text indices, each targeted as narrowly as you like on specific fields of specific predicates.

These text indices are based on a locality-optimized Patricia trie, on which we do intelligent trie traversal for fast wildcard and fuzzy searches. The indexing process is fully transactional, and is able to easily handle billions of documents.

Text-Indexing Features

You may experiment with free-text indices through AGWebView. Indices may be created, profiled, and used through AGWebView and through the Lisp, Python or Java client APIs. The Lisp function for creating free text indices is create-freetext-index. The Lisp API is discussed here in the Lisp Reference Guide.

Each free-text index has a name, so you can apply it to a query or perform maintenance on it.

Each index works with one or more specific predicates, including an option to index all predicates.

An index can be configured to include:

Stop words (ignored words) may be specified for each index, or the index can use a default list of stop words.

An index can make use of word filters such as stem.english, drop-accents, and soundex.

Text searches may be conducted programmatically using AllegroGraph client APIs (Lisp, Python, Java) or as part of SPARQL and Prolog queries.

Text matches use "?" for single-character wildcards, and "*" for multi-character wildcards.

Text queries may use Boolean operators "and" and "or".

Double-quotes around a piece of text mean that AllegroGraph should search for an exact phrase.

AllegroGraph supports "fuzzy" matching using the Levenshtein distance algorithm. You can adjust the desired "distance" to achieve a harder focus (few matches) or a softer focus (many matches).

Ranking of search results reflects word frequencies, and in the case of fuzzy matches, the closeness of the match.

Indexing words in Japanese

The :tokenizer keyword argument to create-freetext-index specifies the tokenizer to use. :default works for most European languages. :japanese specifies the Japanese language tokenizer, as the following screenshot shows:

Creating a Japanese freetext index