Chapter 6. Administrating Zebra

Table of Contents

1. Record Types
2. The Zebra Configuration File
3. Locating Records
4. Indexing with no Record IDs (Simple Indexing)
5. Indexing with File Record IDs
6. Indexing with General Record IDs
7. Register Location
8. Safe Updating - Using Shadow Registers
8.1. Description
8.2. How to Use Shadow Register Files
9. Relevance Ranking and Sorting of Result Sets
9.1. Overview
9.2. Static Ranking
9.3. Dynamic Ranking
9.3.1. Dynamically ranking using PQF queries with the 'rank-1' algorithm
9.3.2. Dynamically ranking CQL queries
9.4. Sorting
10. Extended Services: Remote Insert, Update and Delete
10.1. Extended services in the Z39.50 protocol
10.2. Extended services from yaz-client
10.3. Extended services from yaz-php
10.4. Extended services debugging guide

Unlike many simpler retrieval systems, Zebra supports safe, incremental updates to an existing index.

Normally, when Zebra modifies the index it reads a number of records that you specify. Depending on your specifications and on the contents of each record one the following events take place for each record:

Insert

The record is indexed as if it never occurred before. Either the Zebra system doesn't know how to identify the record or Zebra can identify the record but didn't find it to be already indexed.

Modify

The record has already been indexed. In this case either the contents of the record or the location (file) of the record indicates that it has been indexed before.

Delete

The record is deleted from the index. As in the update-case it must be able to identify the record.

Please note that in both the modify- and delete- case the Zebra indexer must be able to generate a unique key that identifies the record in question (more on this below).

To administrate the Zebra retrieval system, you run the zebraidx program. This program supports a number of options which are preceded by a dash, and a few commands (not preceded by dash).

Both the Zebra administrative tool and the Z39.50 server share a set of index files and a global configuration file. The name of the configuration file defaults to zebra.cfg. The configuration file includes specifications on how to index various kinds of records and where the other configuration files are located. zebrasrv and zebraidx must be run in the directory where the configuration file lives unless you indicate the location of the configuration file by option -c.

1. Record Types

Indexing is a per-record process, in which either insert/modify/delete will occur. Before a record is indexed search keys are extracted from whatever might be the layout the original record (sgml,html,text, etc..). The Zebra system currently supports two fundamental types of records: structured and simple text. To specify a particular extraction process, use either the command line option -t or specify a recordType setting in the configuration file.