3. References and Zebra based Applications

Zebra has been deployed in numerous applications, in both the academic and commercial worlds, in application domains as diverse as bibliographic catalogues, Geo-spatial information, structured vocabulary browsing, government information locators, civic information systems, environmental observations, museum information and web indexes.

Notable applications include the following:

3.1. Koha free open-source ILS

Koha is a full-featured open-source ILS, initially developed in New Zealand by Katipo Communications Ltd, and first deployed in January of 2000 for Horowhenua Library Trust. It is currently maintained by a team of software providers and library technology staff from around the globe.

LibLime, a company that is marketing and supporting Koha, adds in the new release of Koha 3.0 the Zebra database server to drive its bibliographic database.

In early 2005, the Koha project development team began looking at ways to improve MARC support and overcome scalability limitations in the Koha 2.x series. After extensive evaluations of the best of the Open Source textual database engines - including MySQL full-text searching, PostgreSQL, Lucene and Plucene - the team selected Zebra.

"Zebra completely eliminates scalability limitations, because it can support tens of millions of records." explained Joshua Ferraro, LibLime's Technology President and Koha's Project Release Manager. "Our performance tests showed search results in under a second for databases with over 5 million records on a modest i386 900Mhz test server."

"Zebra also includes support for true boolean search expressions and relevance-ranked free-text queries, both of which the Koha 2.x series lack. Zebra also supports incremental and safe database updates, which allow on-the-fly record management. Finally, since Zebra has at its heart the Z39.50 protocol, it greatly improves Koha's support for that critical library standard."

Although the bibliographic database will be moved to Zebra, Koha 3.0 will continue to use a relational SQL-based database design for the 'factual' database. "Relational database managers have their strengths, in spite of their inability to handle large numbers of bibliographic records efficiently," summed up Ferraro, "We're taking the best from both worlds in our redesigned Koha 3.0.

See also LibLime's newsletter article Koha Earns its Stripes.

3.2. Kete Open Source Digital Library and Archiving software

Kete is a digital object management repository, initially developed in New Zealand. Initial development has been a partnership between the Horowhenua Library Trust and Katipo Communications Ltd. funded as part of the Community Partnership Fund in 2006. Kete is purpose built software to enable communities to build their own digital libraries, archives and repositories.

It is based on Ruby-on-Rails and MySQL, and integrates the Zebra server and the YAZ toolkit for indexing and retrieval of it's content. Zebra is run as separate computer process from the Kete application. See how Kete manages Zebra.

Why does Kete wants to use Zebra?? Speed, Scalability and easy integration with Koha. Read their detailed reasoning here.

3.3. ReIndex.Net web based ILS

Reindex.net is a netbased library service offering all traditional functions on a very high level plus many new services. Reindex.net is a comprehensive and powerful WEB system based on standards such as XML and Z39.50. updates. Reindex supports MARC21, danMARC eller Dublin Core with UTF8-encoding.

Reindex.net runs on GNU/Debian Linux with Zebra and Simpleserver from Index Data for bibliographic data. The relational database system Sybase 9 XML is used for administrative data. Internally MARCXML is used for bibliographical records. Update utilizes Z39.50 extended services.

3.4. DADS - the DTV Article Database Service

DADS is a huge database of more than ten million records, totalling over ten gigabytes of data. The records are metadata about academic journal articles, primarily scientific; about 10% of these metadata records link to the full text of the articles they describe, a body of about a terabyte of information (although the full text is not indexed.)

It allows students and researchers at DTU (Danmarks Tekniske Universitet, the Technical College of Denmark) to find and order articles from multiple databases in a single query. The database contains literature on all engineering subjects. It's available on-line through a web gateway, though currently only to registered users.

More information can be found at http://www.dtic.dtu.dk/ and http://dads.dtv.dk

3.5. ULS (Union List of Serials)

The M25 Systems Team has created a union catalogue for the periodicals of the twenty-one constituent libraries of the University of London and the University of Westminster (http://www.m25lib.ac.uk/ULS/). They have achieved this using an unusual architecture, which they describe as a ``non-distributed virtual union catalogue''.

The member libraries send in data files representing their periodicals, including both brief bibliographic data and summary holdings. Then 21 individual Z39.50 targets are created, each using Zebra, and all mounted on the single hardware server. The live service provides a web gateway allowing Z39.50 searching of all of the targets or a selection of them. Zebra's small footprint allows a relatively modest system to comfortably host the 21 servers.

More information can be found at http://www.m25lib.ac.uk/ULS/

3.6. Various web indexes

Zebra has been used by a variety of institutions to construct indexes of large web sites, typically in the region of tens of millions of pages. In this role, it functions somewhat similarly to the engine of Google or AltaVista, but for a selected intranet or a subset of the whole Web.

For example, Liverpool University's web-search facility (see on the home page at http://www.liv.ac.uk/ and many sub-pages) works by relevance-searching a Zebra database which is populated by the Harvest-NG web-crawling software.

For more information on Liverpool university's intranet search architecture, contact John Gilbertson

Kang-Jin Lee has recently modified the Harvest web indexer to use Zebra as its native repository engine. His comments on the switch over from the old engine are revealing:

The first results after some testing with Zebra are very promising. The tests were done with around 220,000 SOIF files, which occupies 1.6GB of disk space.

Building the index from scratch takes around one hour with Zebra where [old-engine] needs around five hours. While [old-engine] blocks search requests when updating its index, Zebra can still answer search requests. [...] Zebra supports incremental indexing which will speed up indexing even further.

While the search time of [old-engine] varies from some seconds to some minutes depending how expensive the query is, Zebra usually takes around one to three seconds, even for expensive queries. [...] Zebra can search more than 100 times faster than [old-engine] and can process multiple search requests simultaneously

I am very happy to see such nice software available under GPL.