In the last blog post Z39.50 for Dummies I gave an introduction on how to use the zoomsh program to access the Z39.50 Server of the Library of Congress.

Today I will show you how to run a simple metasearch on the command line. You want to know which library has the book with the ISBN 0-13-949876-1 (UNIX network programming / W. Richard Stevens)? You can run the zoomsh in a shell loop.

Put the list of databases (zURL's) line by line in the text file zurl.txt:

z3950.loc.gov:7090/voyager
melvyl.cdlib.org:210/CDL90
library.ox.ac.uk:210/ADVANCE
z3950.library.wisc.edu:210/madison

and run a little loop in a shell script:

$ for zurl in `cat zurl.txt`
do
 zoomsh "connect $zurl" \
 "search @attr 1=7 0-13-949876-1" "quit"
done


z3950.loc.gov:7090/voyager: 0 hits
melvyl.cdlib.org:210/CDL90: 1 hits
library.ox.ac.uk:210/ADVANCE: 1 hits
z3950.library.wisc.edu:210/madison: 0 hits

Of course it takes time to run one search request after another. How about a parallel search? Modern xargs(1) commands on BSD based Operating Systems (MacOS, FreeBSD) and the GNU xargs supports to run several processes at a time.

This example runs up to 2 search request at a time and is 2 times faster than the shell script above:

$ xargs -n1 -P2 perl -e 'exec "zoomsh", "connect $ARGV[0]", "search \@attr 1=7 0-13-949876-1", "quit"' < zurl.txt

melvyl.cdlib.org:210/CDL90: 1 hits
library.ox.ac.uk:210/ADVANCE: 1 hits
z3950.loc.gov:7090/voyager: 0 hits
z3950.library.wisc.edu:210/madison: 0 hits

You see here that the order of responses is different, the fastest databases wins and displayed first.

I think it is safe to run up to 20 searches in parallel on modern hardware. Note that there is a lot of process overhead here, for each request 2 processes will be executed. If a connection hangs you must wait until you hit the time out.

This was an example how easy it is to run your own metasearch on the command line. If you want setup a real metasearch for your organization I recommend to try out our metasearch middleware pazpar2, featuring merging, relevance ranking, record sorting, and faceted results. In a nutshell, pazpar2 is a web-oriented Z39.50 client. It will search a lot of targets in parallel and provide on-the-fly integration of the results. The interface is entirely webservice-based, and you can use it from any development environment. The pazpar2 home page is http://www.indexdata.com/pazpar2


Read the other articles of the series Z39.50 for Dummies: Part I, Part III

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Follow-up discussion

These posts sparked an interesting discussion on the Disruptive Library Technology Jester: http://dltj.org/article/z3950-for-dummies/