Code4lib 2011 report -- part I

Code4lib 2011 in Bloomington, IN

So it took me a while to process all that happened during the conference and come up with a short summary. I am not aiming to be anywhere near comprehensive, Code4lib grows fast and there’s quite a lot of stuff going on each year! This year’s talks covered a vast selection of subjects, ranging from back-end software topics (databases and search engines with the ubiquitous Solr, tuning, ranking and merging results from different sources) to front-end user interfaces (usability, testing, data visualization and mash-ups).

Code4lib 2011 took place in Bloomington, Indiana on the Indiana University campus (the Memorial Union building to be exact). It was quite a successful choice of venue, in my opinion. Having the hotel and the lecture rooms (and the food hall with Starbucks to get your daily coffee fix :) all in one place encouraged attendees and speakers alike to share ideas and discuss things long after the scheduled talks had ended.

A thing that, I felt, worked really well was the rather short duration of talks. About 20-30 minutes per talk seems perfectly balanced, giving enough time for the speaker to present his material without having the audience yawning :). As usual, each day’s session concluded with the Lightning Talks, five-minute-long ramblings on random subjects. I must say I enjoyed them quite a bit, with some being even more educational and entertaining than the scheduled lectures.

Pre-conference day

The conference started for good on Tuesday the 5th, but most people showed up on Monday to attend the pre-conference sessions. There were quite a few things going on at once, with at least four or five parallel tracks in both the morning and afternoon slots. I chose to attend the “What’s new in Solr” talk by Erik Hatcher and the “Pre-conference Un-conference”… well, not a talk so much but a free-form discussion and brain storming among the attendees.

I think Erik Hatcher is a frequent Code4lib speaker (at least I remember him having a talk at the 2009 Code4lib) always covering latest advances in Lucene and Solr. He works for Lucid Imagination, a company that provides commercial support for Solr and has a few paid developers on the project, so he’s really a first hand source of information on the cutting-edge Solr development efforts. This and other talks only solidified my belief that Solr is a huge thing in the library-land at the moment, with pretty much every next-generation catalog using it (Blacklight, VuFind) and lots of other smaller or bigger archive projects building on top of it (e.g Smithsonian/NASA Astrophysics Data System covered during one of the talks). Anyway, some of the more notable new Solr features include the so-called field collapsing, or in other words, record grouping ability. This is a pretty powerful feature that allows you to group results with the same field value in a single (or more) entry so it appears as a single document. There are many ways to use this feature: one example is document de-duplication. It’s easy to think about this feature as a faceted search with top documents for a facet (or group query) returned right away. There have been a bunch of improvements to faceting in general: pivot aka “hierarchical” facets, “term” type for filter/facet queries and many more. Solr development is thriving and there have been so many new things and improvements mentioned (ICU filters, spatial queries, edismax, spell checking, auto-suggest, UIMA) that it’s impossible to cover them in one paragraph. For more information check Erik’s slides, available on-line.

The “Pre-Conference Un-conference” was led by Julie Meloni. Well, at least she tried to put some Law and Order into the otherwise completely unstructured thing. The idea of Un-conference is best explained by Wikipedia “..a facilitated, participant-driven conference centered around a theme or purpose.” We started off by putting some discussion topics on the whiteboard and voting on them. The winning topics were then divided into two groups of participants and from there it was on the attendants to take it over. The topics ranged quite a bit: from user interface usability (UX) to Solr indexing and federated searching. In the UX testing discussion there were some very useful insights from people doing that sort of thing. It appears that getting representative test subjects is quite an achievement and encouraging them to take part in the test is an art in itself (sweets or other “bribes” were mentioned :). Also, some usabilty statistic mentioned were quite a shock for me: about 2-5% users ever uses facets! Surprising, especially taking into account how much work goes into supporting them in search engines (e.g Solr) and effort that’s put into building more advanced search interfaces. It’s definitely over-simplifying it, but it does seem that all people want is a Google-like, single search box and good ranking. Ah yes, ranking. We touched a bit on it during the search engine/indexing discussion in terms of federated searching – general conclusion being that it’s impossible to get it right. Still, people try to at least do it “good enough” (don’t we too at Index Data?). Guys from the State Library of Denmark actually managed to persuade Summon to make the term weights available with the search results and use them to combine the catalog results in a smoother fashion.

Conference Day 1

The conference proper started on Tuesday with two welcoming talks from our IU hosts: Brenda Johnson, the Dean of Libraries, and Brad Wheeler the Vice-President for Information Technology. After those short introductions we listened to a keynote speech from Diane I. Hillmann, a metadata expert and the Director of Metadata Initiatives for the Information Institute of Syracuse. Diane talked about relations (tough) between cataloguers and system librarians or programmers (it wasn’t easy giving that talk, taking into account that the room was full of the former :) and touched on the history of cataloging and metadata management (with some cute pictures!) which I’m sure made everyone feel nostalgic.

But let’s get to the meat, shall we? :). Karen Coombs from OCLC started off with a presentation on “Visualizing Library Data”. As always, her talk was full of interesting ideas and creative ways of showing the, otherwise, boring data (if you’re like me, spending most of the time in the terminal you tend to neglect this stuff – don’t!). She showed some nice uses of Google Maps API to geo-locate libraries close to the users (the coordinates are part of the WorldCat records), timelines to chronologically organize bibliography of a given author, charts, graphs to show relations and whatnot. Cool thing that all the demos are available on-line along with the source code.

Next on the stand was Thomas Barker from the University of Pensylvania. He was talking about MetriDoc, an open-source tool developed thanks to the funding from the Institute of Museum and Library Services, UPenn. MetriDoc is meant to be a buzzword-free (no SOA!) answer to the data integration problems (think flat files, DBs, Web Services) within libraries. It uses a Domain Specific Language for expressing workflows and will eventually include a dashboard to assist monitoring and management. If data integration is your bread and butter you should check out the project – but be careful, it will soon change the name as MetriDoc has been used before. For now you can find it here

Just before lunch we heard two more presentations: Brad Skiles from the Kuali Project talked about OLE: Oh-Libraries-in-the-Enterprise (or possibly Open Library Environment) which aims to deliver an all-in-one, enterprise-ready software package for libraries but so far only gotten to define requirements and architecture for it. Coding is (was?) supposed to start with the beginning of this year. Finally, Cary Gordon from the Drupal Association gave an overview of exciting new features and changes coming in Drupal 7.

With a full stomach I was delighted to listen to a talk from Josh Bishof, a local developer at the IU library. He shared his experiences on providing access to mobile devices on the library website. With the proliferation of mobile devices on the market they went for a pure Web (HTML/CSS/JS) solution to support as wide spectrum of them as possible and it seems to work quite well for them. But he didn’t only talk about making websites look nice on small screens, he gave examples on how to utilize capabilities of mobile devices (e.g GPS and how to find your way to the library) and make the library website a gateway to seemingly unrelated stuff like information on campus bus stops. All intended to make the website a bit more interesting for students who seem to, in most cases, visit it mostly to find the library opening hours.

Next on the schedule was a talk of a different sort: a report from a sociological experiment. I quote: “In summer 2010, the Center for History and New Media at George Mason University, supported by an NEH Summer Institute grant, gathered 12 ‘digital humanists’ for an intense week of collaboration they dubbed ‘One Week | One Tool: a digital humanities barn raising’.” So they spent the week together brainstorming, designing, coding and finally releasing to the public Anthologize, a WordPress plugin that transforms this popular CMS into a platform for publishing electronic texts in various formats, including ePub, PDF and TEI. How is that for a project management approach? Check the slides here

Demian Kats, core VuFind developer, talked a bit about what’s been done in VuFind to make it more MARC-agnostic. I guess VuFind doesn’t need an introduction, as currently it’s the one of the most popular next-gen, open-source OPACs available (along with Blacklight) and it’s definitely nice to hear that it is becoming less and less MARC-dependent. Why do I have this weird feeling that catalogers love MARC and programmers hate its guts? The afternoon session concluded with Jay Luker and Benoit Thiell talking about migrating the search infrastructure of Smithsonian/NASA Astrophysics Data System to Solr. With standard integration problems, as any migration of this size brings (9 million metadata records), and users expecting you to at least re-implement existing UI functionality, they were constrained by the fact that the records were stored and maintained in Invenio. Invenio is a good institutional repository and digital library but a poor search server, having trouble indexing and quickly searching on data sets this size. Their idea was to offload searching to Solr and keep the two in sync. If you’re curious whether it worked read the slides (Okay, it did :)

Before heading off to socialize at the reception (with Index Data being one of the sponsors, yay!) we chatted a bit during the breakout sessions. I attended the Drupal one and had a useful chat with Cary Gordon on porting Drupal 6 modules to Drupal 7. Oh, and not to forget about a Code4lib tradition: the Lightning Talks! A lot of condensed information, way too much to present here, what etched into my memory was some pretty unexpected numbers on usage statistic for library websites (afterwards corrected on the mailing list :) – well, it happens, we all know that there are lies, damn lies and statistics anyway.

That’s it for now, folks – stay tuned for the next part covering the last two days of Code4lib 2011. And don’t forget to check out the video archives from the conference

1 Comment

Nice report, Jakub! Don't

Nice report, Jakub!

Don't forget about the last two days. :)

Re usage statistics, you wrote: "Also, some usabilty statistic mentioned were quite a shock for me: about 2-5% users ever uses facets!"

Did the presenter make it clear that the 2-5% refers to users, as in actual people? Or might it refer to usages, as in search sessions, or even to actual searches? The implications are very different depending on what is actually being measured.

And did the presenter say what the range represents? Margin of error based on the number of samples--or something else?

It is hard for me to interpret what the 2-5% might mean without getting more background information.

Anyway, thanks for the report.