Print Friendly  .  About this website   .  Search 

 

 

 

Index Data  >  ⁠Keystone DLS (TKL) (NO LONGER BEING ACTIVELY DEVELOPED)  >  Keystone Administration  >  OAI harvesting from within Keystone

OAI harvesting from within Keystone

In addition, Keystone allows the system administrator to define OAI harvesting tasks. To do so, one must install the libtkl-perl and the tkl-oai-harvester Debian packages.

The OAI harvester daemon called tkl-oai-harvester is started and stopped with the scripts

     /etc/init.d/tkl-oai-harvester start
     /etc/init.d/tkl-oai-harvester stop
     /etc/init.d/tkl-oai-harvester restart
      

When installed with apt-get as .deb pakages on a Debian system, these start/stop scripts are installed properly, so that the harvesting service is started automatically at boot time. The start/stop scripts are rather simple and it should not be difficult to adjust them to work with other operating systems than Debian GNU/Linux.

Harvesting tasks are created in the admin interface. The bibliotheca example portal contains the task directory called bibliotheca/tasks, including two subdirectories oaibizigate, oaitklite, and two Keystone files directory.tkl, and index.tkl, which are tuned to display the resulting oai*.tkl files containing the harvested OAI metadata records.

Navigate within the admin interface to the bibliotheca/tasks directory, and add a new oai task file. Fill in the starting url (remember the trailing slash when addressing a Keystone OAI server!). Type the target directory relative to the portal root - for example "/tasks/oaitklite/" - and choose select status = pending. After saving the resulting task file should look like this:

     <?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
     <task creator="admin" created="2003-07-10, 13:59:50" modifier="admin" modified="2003-07-10, 13:59:50">
       <tasktype>oai</tasktype>
       <url>http://tkl-cvs.indexdata.dk/bibliotheca/</url>
       <target>/tasks/oaitklite/</target>
       <description>OAI harvesting job at our Keystone server</description>
       <status>pending</status>
       <xslt>oai2link.xsl</xslt>
       <handler></handler>
     </task>
      

The content of the <handler> tag is interpreted as an optional script called by the harvester, when the job is finished. There is no restrictions upon the script except that it is run by the web server user, typically www-data, with the corresponding restrictions. The collection of task handler scripts should be placed in the reserved directory tasks/handlers.

Each such task handler script is called with a single argument, the path to the directory, where the harvested records are placed.

Please notice, that when you associate a task handler script other than the trivial one, i.e. do_nothing.handler, to your OAI harvesting task, this handler is given the responsibility for indexing the harvested records. Otherwise, the OAI harvester daemon tkl-oai-harvester performs the indexing.

The <xslt> tag contains the name of an XSLT transformating stylesheet which will be applied to each OAI harvested record before it is stored. The collection of such XSLT transforming stylesheets should be placed in the reserved directory /authorities/oai.

If the optional <prefix> tag is specified, this will be used as the filename prefix when the OAI harvested records are stored as files on your hard-drive. If nothing is specified, link- is used as the default value.

The OAI protocol does not require the repository to support the set record filter[3]. If you want to harvest a set-enabled OAI repository, you can optionally use the <set> setting for this purpose. An empty set value is interpreted as no set value!

When an oai task file is saved, a spool file is automatically placed in the /var/spool/tkl directory, and the OAI harvester will fetch and perform the job within a couple of minutes. During execution of the job, the status tag will change from "pending" over "running" to "finished", and after finishing of the job, the spool file will be removed.

The harvested OAI metadata records can be inspected by directing the usual user web interface to bibliotheca/tasks/oaitklite, where all records are displayed in the fetched order. Clicking at the first link of a record displays some more details of it.

Although the OAI records are indexed on system boot, or when running

     /etc/init.d/tkl index
      

they have been initially marked "hidden" and will not be displayed in search result sets.



[3] The Keystone OAI repository supports the set based record harvesting since version 1.4.5