One of the things Index Data is known for is the YAZ toolkit - an open source programmers’ toolkit supporting the development of Z39.50/SRW/SRU clients and servers. The first release was in 1995 and I've been using it for my own metasearch engine ZACK Gateway since 1998, long before I joined Index Data.
Z39.50 is a client-server protocol for searching and retrieving information from remote computer databases. It is a mature low level protocol like HTTP and FTP. You don't implement Z39.50 yourself, you use the YAZ utilities and the libraries and frameworks for in other languages (C++, PHP, Perl, etc.).
There are many people who thinks that Z39.50 is a dead standard, and hard to understand. That is not true. Z39.50 is still growing in use, stable and very fast. It is the only widely available protocol for metasearch.
Using Z39.50 is not harder than using FTP. I think that the overhead for learning Z39.50 is less than a half day for an experienced programmer. Every problem which you have later is not related to the Z39.50 protocol itself, it is related to underlying system behind the Z39.50 server. Keep in mind that Z39.50 is an API to access (bibliographic) databases. It does not define how the data is structured and indexed in the database.
Z39.50 for Dummies Series - Part 1
I will now start a Z39.50 for Dummies series and show some example how to access a remote database.
Let's start with a simple question: does the Library of Congress have the book "library mashups"? (I strongly recommend you buy this book - I wrote chapter 19):
$ zoomsh "connect z3950.loc.gov:7090/voyager" 'search "library mashups"' quit z3950.loc.gov:7090/voyager: 2 hits
That's all! Only one line on the command line. A SRU or SOAP request would not be shorter.
Now, retrieve the record:
$ zoomsh "connect z3950.loc.gov:7090/voyager" 'search "library mashups"' "show 0 1" "quit" z3950.loc.gov:7090/voyager: 2 hits 0 database=VOYAGER syntax=USmarc schema=unknown 02438cam 22003018a 4500 001 15804854 005 20090710141909.0 008 090706s2009 nju b 001 0 eng 906 $a 7 $b cbc $c orignew $d 1 $e ecip $f 20 $g y-gencatlg 925 0 $a acquire $b 2 shelf copies $x policy default 955 $b rg11 2009-07-06 $i rg11 2009-07-06 $a rg11 2009-07-08 to Policy (CLED/SHED) $a td04 2009-07-09 to Dewey $w rd14 2009-07-10 010 $a 2009025999 020 $a 9781573873727 040 $a DLC $c DLC 050 00 $a Z674.75.W67 $b L52 2009 082 00 $a 020.285/4678 $2 22 245 00 $a Library mashups : $b exploring new ways to deliver library data / $c edited by Nicole C. Engard. 260 $a Medford, N.J. : $b Information Today, Inc., $c c2009. 263 $a 0908 300 $a p. cm. 504 $a Includes bibliographical references and index. 505 0 $a What is a mashup? / Darlene Fichter -- Behind the scenes : some technical details on mashups / Bonaria Biancu -- Making your data available to be mashed up / Ross Singer -- Mashing up with librarian knowledge / Thomas Brevik -- Information in context / Brian Herzog -- Mashing up the library website / Lichen Rancourt -- Piping out library data / Nicole C. Engard -- Mashups @ Libraries interact / Corey Wallis -- Library catalog mashup : using Blacklight to expose collections / Bess Sadler, Joseph Gilbert, and Matt Mitchell -- Breaking into the OPAC / Tim Spalding -- Mashing up open data with biblios.net Web services / Joshua Ferraro -- SOPAC 2.0 : the thrashable, mashable catalog / John Blyberg -- Mashups with the WorldCat Affiliate Services / Karen A. Coombs -- Flickr and digital image collections / Mark Dahl and Jeremy McWilliams -- Blip.tv and digital video collections in the library / Jason A. Clark -- Where's the nearest computer lab? : mapping up campus / Derik A. Badman -- The repository mashup map / Stuart Lewis -- The LibraryThing API and libraries / Robin Hastings -- ZACK bookmaps / Wolfram Schneider -- Federated database search mashup / Stephen Hedges, Laura Solomon, and Karl Jendretzky -- Electronic dissertation mashups using SRU / Michael C. Witt. 650 0 $a Mashups (World Wide Web) $x Library applications. 650 0 $a Libraries and the Internet. 650 0 $a Library Web sites $x Design. 650 0 $a Web site development. 700 1 $a Engard, Nicole C., $d 1979- 963 $a Amy Reeve; phone: 609-654-6266; email: areeve @ infotoday.com; bc: nellor @ infotoday.com
The default exchange format for bibliographic records in Z39.50 is MARC21. This is maybe not what you want to parse yourself.
Ok, now let's download the record in XML format:
$ zoomsh "connect z3950.loc.gov:7090/voyager" 'search "library mashups"' "show 0 1 xml" "quit" z3950.loc.gov:7090/voyager: 2 hits 0 database=VOYAGER syntax=USmarc schema=unknown
02438cam a22003018a 4500 15804854 20090710141909.0 090706s2009 nju b 001 0 eng [large XML output...] 7 cbc orignew 1 ecip 20 y-gencatlg
You can parse the XML output with your favorite tools, usually an XSLT style sheet.
Next time I will show you how to run a meta search in one line.
UPDATE: The latest release of YAZ, inspired by this blog post, supports client-side mapping of MARC to MARCXML, so you can dump XML records even from targets that do not support XML.