Z39.50 for Dummies Part 2

In the last blog post Z39.50 for Dummies I gave an introduction on how to use the zoomsh program to access the Z39.50 Server of the Library of Congress.

Today I will show you how to run a simple metasearch on the command line. You want to know which library has the book with the ISBN 0-13-949876-1 (UNIX network programming / W. Richard Stevens)? You can run the zoomsh in a shell loop.

Put the list of databases (zURL's) line by line in the text file zurl.txt:

z3950.loc.gov:7090/voyager melvyl.cdlib.org:210/CDL90 library.ox.ac.uk:210/ADVANCE z3950.library.wisc.edu:210/madison

and run a little loop in a shell script:

$ for zurl in `cat zurl.txt` do zoomsh "connect $zurl" \ "search @attr 1=7 0-13-949876-1" "quit" done z3950.loc.gov:7090/voyager: 0 hits melvyl.cdlib.org:210/CDL90: 1 hits library.ox.ac.uk:210/ADVANCE: 1 hits z3950.library.wisc.edu:210/madison: 0 hits

Of course it takes time to run one search request after another. How about a parallel search? Modern xargs(1) commands on BSD based Operating Systems (MacOS, FreeBSD) and the GNU xargs supports to run several processes at a time.

This example runs up to 2 search request at a time and is 2 times faster than the shell script above:

$ xargs -n1 -P2 perl -e 'exec "zoomsh", "connect $ARGV[0]", "search \@attr 1=7 0-13-949876-1", "quit"' < zurl.txt melvyl.cdlib.org:210/CDL90: 1 hits library.ox.ac.uk:210/ADVANCE: 1 hits z3950.loc.gov:7090/voyager: 0 hits z3950.library.wisc.edu:210/madison: 0 hits

You see here that the order of responses is different, the fastest databases wins and displayed first.

I think it is safe to run up to 20 searches in parallel on modern hardware. Note that there is a lot of process overhead here, for each request 2 processes will be executed. If a connection hangs you must wait until you hit the time out.

This was an example how easy it is to run your own metasearch on the command line. If you want setup a real metasearch for your organization I recommend to try out our metasearch middleware pazpar2, featuring merging, relevance ranking, record sorting, and faceted results. In a nutshell, pazpar2 is a web-oriented Z39.50 client. It will search a lot of targets in parallel and provide on-the-fly integration of the results. The interface is entirely webservice-based, and you can use it from any development environment. The pazpar2 home page is http://www.indexdata.com/pazpar2

Read the other articles of the series Z39.50 for Dummies: Part I, Part III

1 Comment