We’re in the business of making access to information easier for people and, most of all, for SOFTWARE that in turn makes that information available to people. A lot of our software is based around a kind of switchboard or functional ‘hub’ model which means that when we extend a capability in ONE area, new possibilities open up in other areas that we don’t necessarily even think about ourselves.
It’s always fun to see someone do something really neat with your software. This elegantly designed search interface for Asia studies makes excellent use of Pazpar2. I particularly like the clever use of a bar chart for the date facet. Nice work!
It has been quiet on the pazpar2 front lately, but I will make amends with this blog post.
Lately I have been working on a Harvester that would harvest into a Local Unified Index (LUI). The LUI has been implemented with Solr.
This means we can implement Integrated Search, which is our name for doing both searching remote targets (meta-searching) and a Local Unified Index (LUI), aka Central Index.
Code4lib 2012, Seattle
I was the lucky winner of the Index Data lottery (no actual lottery took place) to go to Code4lib 2012. I was a (Code4lib) Newbie, so I didn’t really know what to expect, but reading Jakub’s blog about his experiences, it sounded like great fun.
It was also my first time in Seattle, so I did take some extra days on both ends to do some exploring. Arriving on Saturday to sunny and warm weather (15 degrees Celsius warmer than Copenhagen, nice!), Seattle did its best to welcome me.
Most of the science of Information Retrieval centers around being able to find and rank the right set of documents in response to a given query. We spend much time arguing about technical details like ranking algorithms and the benefits of indexing versus broadcast searching. Every Information Professional I know both deifies and fears Google because they get it right most of the time – enough so that many people tend to assume that whatever pops to the top of a Google search MUST be right, because it’s right there, in the result screen.
Index Data has posted a statement of interest to the DPLA (Digital Public Library of America) beta sprint. We are submitting this together with two academic partners.
We are often asked about where we stand on the discussion of central indexing versus broadcast metasearching. Our standard answer: “You probably need some of both” always calls for further explanation. Some time ago, I wrote this up for a potential business partner. If it sounds a little like a marketing spiel… guilty as charged. I hope the content will still seem interesting to some folks thinking about these issues. While our specific approach and technology may be ours alone, the technical issues described here are pretty universal.
Code4lib 2011 in Bloomington, IN – Part2
Good things come to those who wait! Here’s the Code4Lib 2011 Report Part 2. I toyed with the idea of postponing it indefinitely and have you checking impatiently the Index Data’s blog RSS feed but Higher Powers persuaded me otherwise :). Anyway, since there’s still some time until the next edition of our favorite conference you can use this report to refresh your memory or give you a taste of things to come…
Code4lib 2011 in Bloomington, IN
So it took me a while to process all that happened during the conference and come up with a short summary. I am not aiming to be anywhere near comprehensive, Code4lib grows fast and there’s quite a lot of stuff going on each year! This year’s talks covered a vast selection of subjects, ranging from back-end software topics (databases and search engines with the ubiquitous Solr, tuning, ranking and merging results from different sources) to front-end user
I spent most of last week up in Edinburgh, for the Open Edge conference on open-source software in libraries, attended mostly by academic librarians and their technical people. It was an interesting time, and I met a lot of interesting people. At the risk of overusing the word “interesting”, it was also of interest to see how widespread the deployment of “next-generation OPACs” like VuFind and Blacklight has become.
We’ve been investigating ways we might add result clustering to our metasearch tools. Here’s a short introduction to the topic and to an open source platform for experimenting in this area.
We have always held that the schism between broadcast metasearching and local indexing is rather goofy – that in practice, you do whatever it takes to get the results in front of your user when and where he needs it, and the best solutions will allow for whatever approach is needed in the moment.
We recently ran into a mysterious problem in one of our glue-layer programs. As soon as we had the “Aha!” moment and realised what was going wrong, we concluded that the mistake was more cultural than technical, and that it was potentially of wide interest – hence this blog post.
Our metasearch middleware, Pazpar2, spends a lot of time doing XML transformations. When we use Pazpar2 with traditional library data sources that return MARC21, we internally convert the received records into MARCXML (if they’re not already represented as such) and then transform into the internal pazpar2 XML format using XSLT (more on this process here).