Machine learning in libraries: profiling research projects rather than people

Machine learning in libraries, as in many other contexts, will often rely on data about people and their activities. Data in a library system can be made available for use with machine learning algorithms to develop predictive models, which have the potential to help patrons in their research. Of course, the data might also be used for the benefit of others without the patrons’ permission, either in the present or at some future time. In particular, the data can serve as a basis for creating profiles of individuals, which may be used for undue advantage. If “knowledge is power,” then it is worth considering whether that power is authorized and what its limits are.

If libraries were to avoid keeping a record of patron activities, then it would be much more difficult to build profiles of patrons. However, libraries need to track some basic information, such as circulation transactions, and can offer better services to patrons if they track and analyze how resources are being used. In addition, they may be obligated to record information about access to electronic resources. There are also the benefits of machine learning, and while “opt-in” or “opt-out” models can limit profiling, many machine learning algorithms will only work well if most people participate.

Suppose, however, that libraries were to request that every patron specify one or more “research projects” corresponding to the patron’s interests, broadly defined, and to select one of these projects when logging in to the library system. Within the library’s database, most patron activities could then be associated with a project rather than with a patron.

For example, Jenny might create a project called “Information theory” for her graduate study, a second project called “Cooking” to pursue her passion for learning new cuisines, and a third called “Reading” to represent reading for pleasure. Now somewhere in the library system there is stored an association between Jenny and her three projects. This association would not be authorized to be shared outside of the library or with machine learning algorithms, except with specific consent in cases where absolutely necessary. Elsewhere in the database, Jenny’s data would be associated not directly with her but with the project she has selected to work on.

When Jenny is logged into the library system, she might see a dashboard for the project that she is currently working on, and can easily switch to another project. This may make a certain sense to Jenny, as it would allow her to view and manage related information together. She probably would not care to see articles about information theory suggested by a recommender system while browsing books of recipes.

With some exceptions, there is no need for machine learning to profile people in order to help with their research interests. Jenny’s selection of a project could even assist the algorithms to be more accurate, by indicating what she is working on. At any rate, the decision would be up to Jenny in how she chooses to organize her projects and to set boundaries for how data are used, based on which of her interests she thinks would benefit. She could begin to use machine learning selectively as a tool, rather than being pressured into an all-or-nothing choice.

Another advantage of this approach could be seen in cases where data are anonymized and exported from the library system to be analyzed by someone outside the library. Anonymized data can sometimes be “re-identified” because the data may reveal a combination of specific activities or interests that can be linked to a person. If the library were to track projects rather than patrons, and assuming the patron-project groupings were not disclosed with the anonymized data, then patron data would be fragmented by project, potentially making re-identification more difficult.

Nassib Nassar joined Index Data in 2015 as the product manager for FOLIO and currently leads a research project on data sharing for open science.


The State of FOLIO: Numbers and Muses


As the calendar turned to a new year, we took the opportunity to reflect on the state of FOLIO today.

The Index Data team members are excited to have jumpstarted this open source effort, and the numbers tell a story of a growing and highly engaged community.

FOLIO Community


FOLIO community adoption


To get beyond the numbers, we asked some of the project leaders to share their thoughts on where FOLIO is today and where it’s going.

The 5-minute version

You can watch the entire interview here.

2017 was a year of remarkable progress for FOLIO, and we have good reasons to believe that 2018 will be even more exciting!  Index Data is eager to engage with the community and be at the forefront of welcoming new participants to the FOLIO project.

Bibliotech Education Offers Node-based Zoom Client Based on Index Data’s YAZ Client

Daniel Engelke, chief technology officer and co-founder of Bibliotech Education Ltd, notified us about their release of an open source Z39.50 toolkit for Node.js that uses Index Data’s YAZ toolkit. The source code is available on GitHub. Daniel said, “Having the YAZ toolkit available, specifically the libyaz5-dev package made developing a zoom client in Node.js extremely easy for us!”

npm package

Bibliotech is an online platform that provides students with access to their textbooks and libraries with affordable textbook packages.

RA21 project aims to ease remote access to licensed content

In the two decades since electronic journals started replacing print journals as the primary access to article content, the quandary of how to ensure proper access to electronic articles that are licensed and paid for by the library has been with us.Note 1 Termed the “off campus problem”, libraries have employed numerous techniques and technologies to enable access to authorized users when they were not at their institutions. Access from on campus is easy — the publisher’s system recognizes the network address of the computer requesting access and allows the access to happen. Requests from network addresses that are not recognized are met with “access denied” messages and/or requirements to pay for one-off access to articles. To get around this problem, libraries have deployed web proxy servers, virtual private network (VPN) gateways, and federated access control mechanisms (like Shibboleth and Athens) to enable users “off campus” to access content. These techniques and technologies are not perfect, though (what happens when you get to a journal article from a search engine, for instance), and this is all well known.

Stepping into this space is the STM Association — a trade association for academic and professional publishers — with a project they are calling RA21: Resource Access in the 21st Century. The website describes the effort as:

Resource Access for the 21st Century (RA21) is an STM initiative aimed at optimizing protocols across key stakeholder groups, with a goal of facilitating a seamless user experience for consumers of scientific communication. In addition, this comprehensive initiative is working to solve long standing, complex, and broadly distributed challenges in the areas of security and user privacy. Community conversations and consensus building to engage all stakeholders is currently underway in order to explore potential alternatives to IP-authentication, and to build momentum toward testing alternatives among researcher, customer, vendor, and publisher partners.

Last week and earlier this week there were two in-person meetings where representatives from publishers, libraries, and service providers came together to discuss the initiative. Two points were put forward as the grounding principles of the effort:

  1. In part, the ease of resource access within IP ranges makes off campus access so difficult
  2. In part, the difficulty of resource outside IP ranges encourages legitimate users to resort to illegitimate means of resource access

What struck me was the importance of the first one, and its corollary: to make off-campus access much easier we might have to make on-campus access a little harder. That is, if we ask all users to authenticate themselves with their institution’s accounts no matter where they are, then the mode of access becomes seamless whether you are “on-campus” or “off-campus”.

The key, of course, is to lower that common barrier of personal authentication so far that no one thinks of it as a burden. And that is the focus of the RA21 effort. Take a look at the slides [PowerPoint] from the outreach meeting for the full story. The parts that I’m most excited about are:

  • Research into addressing the “Where Are You From” (WAYF) problem — how to make the leap from the publisher’s site to the institution’s sign-on portal as seamless as possible. If the user is from a recognized campus network address range, the publisher can link directly to the portal. Can clues such as geo-location also be used to reduce the number of institutions the user has to pick from? Can the user’s affiliated institution(s) be saved in the browser, so the publisher knows where to send the user without prompting them?
  • User experience design and usability testing for authentication screens. Can publishers agree on common page layout, wording, graphics to provide the necessary clues to the user to access the content?

The RA21 group is leveraging two technologies, SAML and Shibboleth Note 2, to accomplish the project’s goals. There are some nice side effects to this choice, notably:

  • privacy aware: the publisher trusts the institution’s identity system properly authorize users while providing hooks for the publisher to offer personalized service if the user elects to do so.
  • enhanced reporting: the institution can send general tags (user type, department/project affiliation, etc.) to the publisher that can be turned into reporting categories in reports back to the institution.

Beginning next year organizations will work on pilot projects towards the RA21 goals. One pilot that is known now is a group of pharmaceutical companies working with a subset of publishers on the WAYF experience issue. The group is looking for others as well, and they have teamed up with NISO to help facilitate the conversations and dissemination of the findings. If you are interested, check out the how to participate page for more details.

Within Index Data, we’re looking at RA21’s impact on the FOLIO project. FOLIO is starting up a special interest group that is charged with exploring these areas of authentication and privacy. I talk more about the intersection of RA21 and FOLIO on the FOLIO Discuss site.

Note 1: I am going to set aside, for the sake of this discussion, the argument that open access publishing is a better model in the digital age. That is probably true, and any resources expended towards a goal of appropriately limiting access to subscribed users would be better spent towards turning the information dissemination process into fully open access. The resource access project described here does exist, though, and is worthy of further discussion and exploration. back to text

Note 2: SAML (Security Assertion Markup Language) is a standard for exchanging authentication and authorization information while Shibboleth is an implementation of SAML popular in higher education. back to text

Index Data Staff Offer a Primer on the FOLIO Code

Coinciding with the public release of the FOLIO code repositories, Index Data staff offered a primer to developers on the context around how the FOLIO platform provides integration points for module development through the Okapi layer and what that means for developing modules in FOLIO. In the 90 minute presentation (embedded below), Peter Murray (open source community advocate) and Jakub Skoczen (architect and developer) walk through how the pieces come together at a high level, and Kurt Nordstrom (software engineer) demonstrated how to build a back-end module for Okapi using NodeJS.


The video is available for download from the Open Library Environment website.

Index Data Turns 20

Today, it was 20 years ago that Adam Dickmeiss and I founded Index Data together in Copenhagen. There was a bottle of champagne, and our parents shared the moment with us along with our wives, because, honestly, we were little more than big kids at the time. We were a little scared, but we were also in the fortunate position of being young, still without kids or debt. Oh, and our wives had steady jobs. Let the adventure begin!

We met just a few years earlier, as interns at the State Library Service (Statens Bibliotekstjeneste), looking to make some extra money for college. We were hired during a tumultuous period, both organizationally and technologically. In Denmark, libraries benefit from substantial support from national and municipal authorities. At that time, there was an effort afoot to consolidate the services offered to the public and research libraries, respectively, into a single organization, the Danish Library Center (DBC) with a single software platform, including a centralized, national union catalog and interlibrary loan platform. Eventually, Adam and I along with a team of young programmers were given the task of creating an indexing and search engine for the new system. The task took about a year, and I still look back on it as one of the most exciting projects I have worked on. Somewhere in there, Adam managed to graduate and I managed to forget all about my studies, but we both felt like we knew everything there was to know about library technology (remember, we were just big kids!).

The new system went into production on schedule and was a tremendous success. Today, it forms the basis for a unique, patron-centered ‘national OPAC’ that gives any citizen access to the collection of every library in the nation. But Adam and I had developed a taste for big, ambitious projects. In a sense, our shared experience had made us into entrepreneurs, and we felt hungry for more.

For our first day of work at Index Data, we each brought a chair, our PC from home, and a thousand dollars which formed the entirety of the operating capital of the company. The goal of the business, we decided, would first and foremost be to provide a good place of work for us and any colleagues who might one day join us. The purpose was to have fun doing work that we loved. The business model was to create cutting-edge, client/server-oriented software (buzzword of the era) for libraries, and to finance the development by offering our services as consultants in whatever areas we could. We felt that the best place to start would be to build a complete Integrated Library System (I did mention we were just big kids).

Our workplace was a small room in a disused factory building which had been turned into rental offices. But not in the fashionable, expensive way it’s being done today. This place was rough. Our neighbors were bohemian artists and tiny film production companies hoping to make it big. Break-ins were a continuing concern, so one of our first purchases was a large steel grid that we padlocked to our door at night. At one point during a rainstorm, water started coming down through the ceiling, so we draped plastic sheets to keep our computers dry. After that, strange mushrooms would sometimes grow out of our walls.

The original artwork for our first Christmas card, by Adam's brother Otto Dickmeiss

The original artwork for our first Christmas card, by Adam’s brother Otto Dickmeiss

In between consulting gigs, we worked steadily on our own software, building components that we thought we’d need for our big library system. At one point, we started releasing our software under Open Source licenses. We reasoned that someone might see the software and decide to ask us to help them work with it. Fresh out of an academic environment where Open Source projects were enormously influential (Linux was still new, then, but getting lots of attention), it felt natural to us but it was still a relatively unknown phenomenon in the larger industry and we suffered a good deal of friendly ribbing from our friends. We also endured some more pointed questions from our wives that were still carrying the brunt of the household expenses.

But something cool happened; people did find our software, and the consulting work increasingly involved integration and enhancements to our growing family of software components. Along the way, the building blocks we’d been creating took on a life of their own and became a focal point of our work; we never did build that library system, but our software components have been integrated into the vast majority of library systems out there in various roles, and we have enjoyed two absolutely remarkable decades of working relationships with exciting organizations and brilliant people all over the globe. We moved out of the mushroom-infested office and were joined by coworkers. We had kids.

The Europagate project team in 1995. Adam and Sebastian in the back row

The Europagate project team in 1995. Adam and Sebastian in the back row

Ten years ago this summer, my family and I moved to the US. Our business gradually shifted away from Denmark and Europe, but we struggled to maintain our old, informal and very personal company culture with me way over in New England and the rest of the team in Copenhagen. In 2007, Adam and I made a decision that in some ways were as dramatic as quitting our jobs and founding the company. We hired Lynn Bailey to be our CEO and re-configured the company mentally and structurally to be a US-based company which just happened to have its core development team in Copenhagen. Soon, they were joined by colleagues in many locations as we made a policy of hiring the most talented people with a strong interest in search and library technology, no matter where they lived. Today, we are a virtual company with colleagues in six different countries (Denmark, Sweden, Germany, the UK, Canada, and in four different US states). After writing the book on operating a commercial business around Open Source Software, we had to learn how to be a tiny multinational company, and how to work well together as a team while scattered across the globe.

The company that existed ten years ago, with a jolly group of Danes hanging out in the middle of downtown Copenhagen, has been transformed almost beyond recognition. But what has arisen in its stead is in many ways more vital and exciting. Our team is passionate about their work: We swim in a ridiculously specialized area of the sea of information technology, but we do so with tremendous pride and passion.

Index Data, at a recent team meeting in New England

Index Data, at a recent team meeting in New England

It has been an amazing 20-year journey. We were successful in creating a fun and supportive work environment for ourselves and our colleagues. I couldn’t be more proud and grateful, both for my great coworkers and for the remarkable people I have had the good fortune to do business with.

Let the adventure continue!

Adding Discovery and More to Koha with Smart Widgets

Previous post: Using Smart Widgets to Integrate Information Access

This is the second in a series of posts about our Smart Widget platform. You can also read the first post or find background material about the technology.

In our introduction to Smart Widgets, I said that part of our purpose in developing the technology was to move away from the search box as the primary paradigm for accessing information: to give librarians more tools to organize and present information for their consumers/patrons. But the widgets can also be used to IMPROVE the capabilities of the search boxes that we already have — to offer new functions beyond what your existing software is capable of. In this post, we will show a couple of different examples of how Smart Widgets can be used to add functionality to Koha, but the same principles apply to any system that allows you to customize the HTML structure of a search results page.

Previously, I showed an example of a search result widget which simply executed what you might call a ‘canned’ search, and displayed the results of that search whenever the page was loaded. The HTML code for such a widget might look something like this:

<div class=’mkwsRecords’ autosearch=’american political history’>

This widget will show a list of matching records for the query ‘american political history’ using the set of databases that has been configured into the MasterKey back-end for the library (literally anything searchable on the web can be accessed in this way). But what if we were to put such a widget on, say, the search results page of an OPAC, and have it search for whatever query the user has input? The Smart Widgets allow us to try this: The syntax would look like this:

<div class=’mkwsRecords’ autosearch=’param!query!’>

Where ‘query’ is whatever HTTP parameter name the particular search interface uses to carry the search term.

In Koha, there is a function on the staff page that allows the administrator to add extra markup to the end of the facet column on the left-hand side of the display. That is an ideal place for us to slip in a little extra functionality. The screen looks like this:

There’s a little more to this than the simple examples I showed in the last post. That is because the initial integration was so easy that we decided to see if we could add an entire Discovery function to Koha (spoiler: We could!). Embedding this markup in the facet bar gives us the following display:

If you look at the facet column to the left, you will see, below the normal facets, a list of different databases and their hit-count for the given query. The list is updated as results come in so the normal Koha response time is not affected in any way whatsoever.

We also added a separate tab to the Koha OPAC with the Discovery function, so that if you click on the widget above, you will get to this page

Pretty cool, right?

We’ll go through the steps needed to add this functionality in a later, more technical blog post, but before we get to that, I want to show you another application of the Smart Widgets which isn’t about Discovery/metasearching.

We have been thinking that there might be useful functions that an OPAC could perform beyond merely providing a peek into the physical holdings of a library (or physical/electronic in the case of Discovery platforms). What if the OPAC could evolve into a kind of information center, a front door to the library as a facilitator of learning or research.

We thought that one way to explore this idea further was to surface reference content right into the OPAC itself, to supplement the usual bibliographic results. Wikipedia is a good subject for this experiment, since it is free and quite often provides relevant information to a query. So we went back to the Koha administration console and replaced the Discovery widget with a special Wikipedia widget, and this is what we got for a search for ‘nuclear power’.

Is this useful? You’ll have to be the judge, but I think it often could be: As a way to provide another angle on the user’s query (another ‘facet’), and possibly inspiration/guidance for further research. Now obviously, Wikipedia is far from being the only possible source: Commercial reference sources or even locally maintained knowledge bases might be more obvious candidates in some settings. The widget approach would work with just about any source or combination of sources imaginable.

So, in this post, we have shown how you can add significant functionality to Koha without having to install local software and without complex programming. We happen to think that these Smart Widgets are a natural outgrowth of the move towards cloud-based services and I believe you’ll be seeing a lot of them show up in the years to come from all kinds of data and service providers. But if you’re interested in learning more about our take on them, or possibly trying them out in your OPAC, please get in touch.

Using Smart Widgets to Integrate Information Access

Next post: Adding Discovery and more to Koha with Smart Widgets.

This is the first of a series of blog posts in which we will talk about a concept that we have been developing over the past few years. We call it ‘smart widgets’ to distinguish our approach to widgets from the almost ubiquitous notion of ‘widgets’ meaning little search boxes that you insert into your page, but which ultimately send your users to some remote site.

How do they work? Well, for a really simple example, consider this search box:

You can put in a search and you will search across a collection of different resources, in real time. You might like to look at the HTML source code of this blog post to see how it was implemented, but to save you the trouble, the bit that does the searching looks like this:

<link rel=”stylesheet” type=”text/css” href=”//” />
<script type=”text/javascript” src=”//”></script>
<div class=”mkwsSearch”></div>
<div class=”mkwsResults”></div>

That’s all. You can take that bit of HTML and put it in your home page, and it should do the same thing. The code uses Ajax to communicate with our SaaS MasterKey back-end, and it will search virtually any combination of resources that you can imagine if you have an account.

In a nutshell, our Smart Widgets are intended to make access to information a fluid thing — something that can be easily manipulated and surfaced just about anywhere you can imagine, from a blog post to a library home page. In that sense, our widgets are two things:

  • A technology platform that uses dynamic HTML together with our SaaS back-end to make it incredibly easy to embed access to almost any combination of resources into almost any page.
  • A whole new way of thinking about how information sources are used in the service of library patrons, and how the library can project its services into the surrounding community (whether that is a town, as school, or a business).

In a way that second point arose from the realization that “Searching” — i.e. providing mechanisms by way patrons could access pre-indexed collections of materials by entering search terms — has become a commodity. Librarians have spent decades (if not centuries) thinking of mechanisms to make things findable. Today, the Internet and Google in particular have made that function utterly mainstream — it is part of the fabric of the Internet, and so much a part of people’s everyday experience that it has become very hard for the library community to convince anyone that we have a better solution. This is pleasing in a way — it has been cool to watch something so esoteric as searching become just an everyday part of our culture. But it also presents new challenges. Ironically, while easier access to massive piles of information have lead some decision makers to question the continued value of libraries and librarianship, at the same time people are struggling with information overload; how to filter and select the best sources to answer a given question. Information is not the same as knowledge, and access to too much information may in fact impede the acquisition of knowledge.

We in the library community are partly to blame for this. We have pursued the vision of the ‘universal search box’ for so long and with such ardor that it is only now, as we’re finally reaching that goal, that some people are asking if this really was such a good idea after all. I think certainly for some tasks, single search boxes are a great solution (the success of Google makes this clear), but I don’t think it’s the right answer for every problem. We believe that libraries have more important roles to play than merely managing search boxes, and we would like to use our widget platform to support those roles, by organizing and enabling access to information, and ultimately by facilitating the creation of knowledge from that information.

Below is a widget that surfaces the current results from the Digital Public Library of America for ‘american political history’. There is no search box: The widget retrieves the most current information based on a search that has been prepared by the page author (me, in this case).

DPLA results will appear here

The HTML source code for this widget looks like this:

<div class=’mkwsRecords mkwsTeam_dpla’
autosearch=’american political history’

Why is such a widget useful? Well, the widget can be used to surface results from virtually any combination of resources (up to over 100 databases per widget), ordered in any way desirable, for any given search. The sources for the widget can include subscription databases and open access sources. Different widgets can be combined together on a page to illuminate a current event, a certain genre of literature, or a subject of research. The widgets can be a powerful tool to build information sources of all kinds for the users of a library.

Over the coming days and weeks, we will be discussing various applications and uses of the widgets. We hope you’ll agree they present some pretty exciting possibilities.

As always, feel free to contact us if you have any questions or if you are interested in using the widgets in your own applications or site. You can also find more information at this site.

Next post: Adding Discovery and more to Koha with Smart Widgets.

Add Metasearching to Your Application with Three Lines of HTML

One of the problems we’ve had over and over at Index Data is that we build all these cool back-end tools — things like the metasearching middleware Pazpar2 — but then don’t have a good way to show them off. We’ve never really focussed much on building UIs, so we have to do demos that go like this:

… And then you just type this 200-character URL into your web browser, and you can see this XML response that comes back, and then you just take this identifier from this bit of the XML structure and use it to build this other 200-character URL, which …

Yesterday, we launched a new toolkit that changes that — the MasterKey Widget Set, or MKWS for short. The idea is that you can add widgets to your existing web-site — ILS, content management system, blog, or whatever. The widgets provide broadcast searching quickly and painlessly, customised to fit the way you do things. One widget for a search box, one for result records, one for facets, one for switching between UI languages, and so on. Mix ’em and match ’em. The individual widgets are HTML <div>s with well-known ids beginning with mkws: for example, <div id="mkwsSearch"> provides the search box and button.

So for example, the following three lines of HTML constitute a complete, functional (though ugly) metasearcher:

<script type="text/javascript" src=""></script>
<div id="mkwsSearch"></div>
<div id="mkwsResults"></div>

The search-related content (search boxes, results, facets, paging controls, sorting controls) are all styled with CSS using MKWS-specific classes. You can easily override those classes with your own CSS, to match the widgets to your own web-site’s look and feel.

Once you move past this very simplest kind of MKWS application, you can have a lot of control over behaviour. We’ll look at some of the other options in subsequent posts, but if you want a sneak preview, take a look at the MKWS manual, Embedded metasearching with the MasterKey Widget Set. There are plenty of examples linked from the home page, too.

We’re really excited about MKWS because it opens up metasearching application design to people who otherwise wouldn’t be able to go near it. You don’t need to be an JavaScript wizard, or know about XML or JSON. We hope we’ll see designers using MKWS to make things we’ve not even imagined yet.

Switchboard Leverage

We’re in the business of making access to information easier for people and, most of all, for SOFTWARE that in turn makes that information available to people. A lot of our software is based around a kind of switchboard or functional ‘hub’ model which means that when we extend a capability in ONE area, new possibilities open up in other areas that we don’t necessarily even think about ourselves. Ironically, it means that we don’t always KNOW — and certainly we don’t always ADVERTISE — what we are capable of. In this post, I want to think about some of the ways in which we switch between functions, protocols, and data.

Our YAZ toolkit, launched in ’95, is fast approaching the end of its teenage years. I had more hair over a larger area of my head, then. It started its life as one very useful kind of switchboard: it allowed client and servers to implement both Z39.50 and the Open Systems Interconnection (OSI) based flavors of Information retrieval protocols which were pursued in the US and the rest of the world, respectively. This created a huge value at the time, and allowed both groups of implementors to focus on functionality and content, without being limited in whom they could interoperate with. ISO eventually came to its senses and adopted Z39.50. We helped created a client-side API (ZOOM), which enabled developers in a huge number of different programming languages to develop search clients using YAZ and other toolkits. When SRU, SRW, and later SRU 2 came along, we used the abstract nature of the ZOOM API to hide the differences between all these different protocols, and YAZ again became a kind of switchboard for competing ways of doing the same thing. A while back, we also added support for Solr’s webservice API, which allows you to break down the distinction between what is indexed by you and what is remotely accessed from elsewhere.

Developers who use YAZ, then, find it easy to create polyglot applications — systems which will deal effortlessly with numerous data sources, and which may EXPOSE data through many different mechanisms. Every time we add a new mechanism — because the world seems to feel that anything worth doing is worth doing in many ways — many new possibilities are opened.

Our Metaproxy takes this a step further by essentially acting as a switchboard between anything that YAZ supports (and then some!). Some folks use the SRU server function of Metaproxy to access Z39.50 resources without having to use an API like YAZ’s own. In fact, though, Metaproxy can talk to virtually ANYTHING you can imagine: SRU, Z39.50, Solr-based indexes. It can even use our Connector technology to access proprietary APIs and screen-scraped resources. Our list of supported search targets is climbing quickly towards four thousand, and lately we have even made it a SaaS platform, so people can access just about ANYTHING without having to install a bunch of software locally. But people are also using Metaproxy as a convenient way to implement open standards like Z39.50 and SRU on top of their OWN content — it can talk to a local index in Solr and expose the contents to the world in a well-defined way. Just lately, thanks to SRU 2.0, you can even share facet functionality in this way.

Our MasterKey platform takes the capabilities of this ‘access layer’ and adds some crucial elements: One is a cross-database search function that allows for extremely efficient metasearching across Solr-based indexed and remote sources through almost any reasonable mechanism. Relevance ranking, merging, facets generated on the fly. Another element is a model for administering subscriptions and search targets: Once you have thousands of sources to keep track of, making it easy for people to sift through them to choose just the right sources for their end-users become a big challenge.

On top of this we have yet another switchboard: We have a simple webservice API which exposes this unified view of a huge information language through a faceted search metaphor, but layered on top of that we have a substantial array of different tools to enable people to leverage all this functionality: Today, we have plugins for the Drupal and Typo3 CMSes, for JaveServer Faces, for JavaScript programmers, and, soon to come: A widget set which will enable non-programmers to drop very advanced search functionality into any website just by adding a couple of HTML nodes to their page layout.

Each time we add a function to one of the corners of our platform, new capabilities emerge in the strangest, unthought-of corners of the larger framework. Maybe it is not so strange that out of all the information-related tasks we perform every day, perhaps the one we struggle with the hardest is the answer to the simple question: “So, what do you guys DO, exactly?”