Most of the science of Information Retrieval centers around being able to find and rank the right set of documents in response to a given query. We spend much time arguing about technical details like ranking algorithms and the benefits of indexing versus broadcast searching. Every Information Professional I know both deifies and fears Google because they get it right most of the time – enough so that many people tend to assume that whatever pops to the top of a Google search MUST be right, because it’s right there, in the result screen.
Index Data has posted a statement of interest to the DPLA (Digital Public Library of America) beta sprint. We are submitting this together with two academic partners.
We are often asked about where we stand on the discussion of central indexing versus broadcast metasearching. Our standard answer: “You probably need some of both” always calls for further explanation. Some time ago, I wrote this up for a potential business partner. If it sounds a little like a marketing spiel… guilty as charged. I hope the content will still seem interesting to some folks thinking about these issues. While our specific approach and technology may be ours alone, the technical issues described here are pretty universal.
Code4lib 2011 in Bloomington, IN – Part2
Good things come to those who wait! Here’s the Code4Lib 2011 Report Part 2. I toyed with the idea of postponing it indefinitely and have you checking impatiently the Index Data’s blog RSS feed but Higher Powers persuaded me otherwise :). Anyway, since there’s still some time until the next edition of our favorite conference you can use this report to refresh your memory or give you a taste of things to come…
Code4lib 2011 in Bloomington, IN
So it took me a while to process all that happened during the conference and come up with a short summary. I am not aiming to be anywhere near comprehensive, Code4lib grows fast and there’s quite a lot of stuff going on each year! This year’s talks covered a vast selection of subjects, ranging from back-end software topics (databases and search engines with the ubiquitous Solr, tuning, ranking and merging results from different sources) to front-end user
I spent most of last week up in Edinburgh, for the Open Edge conference on open-source software in libraries, attended mostly by academic librarians and their technical people. It was an interesting time, and I met a lot of interesting people. At the risk of overusing the word “interesting”, it was also of interest to see how widespread the deployment of “next-generation OPACs” like VuFind and Blacklight has become.
We’ve been investigating ways we might add result clustering to our metasearch tools. Here’s a short introduction to the topic and to an open source platform for experimenting in this area.
We have always held that the schism between broadcast metasearching and local indexing is rather goofy – that in practice, you do whatever it takes to get the results in front of your user when and where he needs it, and the best solutions will allow for whatever approach is needed in the moment.
We recently ran into a mysterious problem in one of our glue-layer programs. As soon as we had the “Aha!” moment and realised what was going wrong, we concluded that the mistake was more cultural than technical, and that it was potentially of wide interest – hence this blog post.
Our metasearch middleware, Pazpar2, spends a lot of time doing XML transformations. When we use Pazpar2 with traditional library data sources that return MARC21, we internally convert the received records into MARCXML (if they’re not already represented as such) and then transform into the internal pazpar2 XML format using XSLT (more on this process here).
Much is being written in these fast-paced times about the future of libraries and librarianship. Opinions and prescriptions come from every corner of the information sector: academia, technology companies, consortia, standards bodies, professional associations and ad-hoc interest groups–many independent observers add their own perspectives to the conversations. Sometimes, though, stories tell us more than reports, theories, and conference proceedings.
Inspired by Jakub’s posting yesterday, I wondered how easy it would be to build an HTTP-to-Z39.50 gateway similar to his in Ruby, my language of the moment. Different languages offer different tools and different ways of doing things, and it’s always instructive to compare.
Yaz4J is a wrapper library over the client-specific parts of YAZ, a C-based Z39.50 toolkit, and allows you to use the ZOOM API directly from Java. Initial version of Yaz4j has been written by Rob Styles from Talis and the project is now developed and maintained at Index Data. ZOOM is a relatively straightforward
Recently, my son asked me a series of questions about the cold war, and the political/military paradigm of mutually assured destruction (MAD for short). It’s always seemed like an odd premise to me, and somehow, discussing it with a 13-year old doesn’t make it look any more sensible. However, we came to agree that landing on the moon was a pretty cool thing. Would the lunar landing have happened, realistically, without the cold war?