IDZEBRA  2.2.7
Zebra

Introduction

Zebra is a search engine for structure data, such as XML, MARC and others.

API users should read the api.h for all the public definitions.

The remaining sections briefly describe each of Zebra major modules/components.

Base Utilities

The Zebra utilities (util.h) defines fundamental types and a few utilites for Zebra.

Resources

The resources system (res.h) is a manager of configuration resources. The resources can be viewed as a simple database. Resources can be read from a configurtion file, they can be read or written by an application. Resources can also be written, but that facility is not currently in use.

Bfiles

The Bfiles (bfile.h) provides a portable interface to the local file system. It also provides a facility for safe updates (shadow updates). All file system access is handle by this module (except for trival reads of configuration files).

Dictionary

The Zebra dictionary (dict.h) maps a search term (key) to a value. The value is a reference to the list of records identifers in which the term occurs. Zebra uses an ISAM data structure for the list of term occurrences. The Dictionary uses Bfiles.

ISAM

Zebra maintains an ISAM for each term where each ISAM is a list of record identifiers corresponding to the records in which the term occur. Unlike traditional ISAM systems, the Zebra ISAM is compressed. The ISAM system uses Bfiles.

Zebra has more than one ISAM system. The old and stable ISAM system is named isamc (see isamc.h). Another version isams is a write-once isam system that is quite compact - suitable for CD-ROMs (isams.h). The newest ISAM system, isamb, is implemented as a B-Tree (see isamb.h).

Data-1

The data1 (data1.h) module deals with structured documents. The module can can read, modify and write documents. The document structure was originally based on GRS-1 - a Z39.50 v3 structure that predates DOM. These days the data1 structure may describe XML/SGML as well. The data1, like DOM, is a tree structure. Each node in the tree can be of type element, text (cdata), preprocessing instruction, comment. Element nodes can point to attribute nodes.

Record Control

The record control module (recctrl.h) is responsible for managing the various record types ("classes" or filters).

Result-Set

The Result-Set module (rset.h) defines an interface that all Zebra Search Results must implement. Each operation (AND, OR, ..) correspond to an implementation of that interface.

DFA

DFA (dfa.h) Deterministic Finite Automa is a regular expression engine. The module compiles a regular expression to a DFA. The DFA can then be used in various application to perform fast match against the origianl expression. The Dict uses DFA to perform lookup using regular expressions.