Name

cf_webservice — Connector Framework Web service

DESCRIPTION

cf_webservice is a Metaproxy filter which offers a Web service for the Connector Framework.

The module may also provide a Z39.50 server for dealing with search based connectors. For a description of the Z39.50 server functionality refer to the cf-zserver(8) manual page. The remainder of this man page is focused on the webservice.

The Web service uses JSON content for responses. A future version may also support XML.

HTTP clients must use Content-Type application/json to post JSON content. The Web Service will use the same Content-Type for JSON content responses as well.

HTTP clients must use Content-Type text/xml to post XML.

The cf_webservice module filters only HTTP requests with a certain prefix. The default prefix is "connector" and is used in the description that follows.

The following requests are offered by the webservice:

POST /connector[?arguments]

Makes a connector framework session. The content is Connector Framework File (XML).

Session data passed as a JSON string in the X-CF-Args header will be decoded and available to the connector in the $.session object.

If successful, the response includes a JSON object with a single member "id" with a session integer value. This session must be used in subsequent requests to refer to this connector.

If the content is empty, no connector is loaded into the engine. In this case only the engine session is established. A connector may be loaded later with load_cf operation (see below).

One or more arguments may be given for the POST in the form of name=value pairs, separated by &.

proxy=IP

Specifies HTTP proxy for the session.

thread=0|1

Enables threaded mode (value 1), or forked mode (value 0). If thread is not given, forked mode is used.

loglevel=level

Specifies log level for the engine session. The following names are recognized: DEBUG, INFO, WARN, ERROR .

logmodules=modules

Enables logging only for a subset of modules to be retrieved by the log webservice command. The modules list is comma separated list of named modules. The available modules are: runtime (JavaScript runtime logger), engine (Engine encapsulating browser), timing (timing for tasks), stdout (unstructured text printed to standard output in various places).

By default logging is enabled for all modules.

timeout=seconds

Sets task timeout for task in session. Any task that takes longer than this amount will be aborted and the session will be terminated.

By default, the timeout is 120 seconds (2 minutes).

POST /connector/id/op/opargs

Performs an operation op with arguments opargs on connector identified by id.

The following operations, op, are supported: run_task, run_task_opt, run_tests, screen_shot, load_cf and log.

For operations run_task and run_task_opt, the opargs is the name of the task to run and the POSTed content is task parameters. The POSTed content must be JSON.

For operation run_tests, the opargs is the test tasks to run.

Operation screen_shot returns an Window dump of the current browser in PNG format. Content-type of HTTP response is image/png.

Operation log returns the log for the connection session as it is produced by the Engine as well as the shared Java runtime. It may be limited to certain modules by the logmodules argument when POSTing a connector. The log operation may optionally be followed by ?clear=1 which will clear the log upon completion. Thus a following log operation will only return log material following most recent log operation.

log is a special operation, where the POSTed content and content-type is ignored.

dom_string returns the current DOM for the session rendered as a string. The POSTed content and content-type is ignored.

Operation load_cf loads the connector posted (XML). Currently the Content-Type is ignored. It should be text/xml.

If an operation is successfully completed (HTTP status 200), the HTTP response is result. For run_task, run_task_opt, run_tests the response is a JSON document. For run_tests, however, the response is simply a JSON object with name "result" and a boolean value with true for success and false for failure.

For operation log the response is text and content-type is set to text/plain.

DELETE /connector/id

Deletes the connector identified by id.

CONFIGURATION

The webservice is implemented as a shared object for the Metaproxy server. The Module ID of is simply "cf".

The following elements may be given as part of the module configuration:

env

Specifies various settings WRT the environment in which the the module is run. These are the values that were previosly controlled by environment variables for cf-zserver. The env element takes several attributes. These are:

tmp_dir

Same as CF_TMP_DIR.

app_path

Same as CF_APP_PATH.

module_path

Same as CF_MODULE_PATH.

display_lock

Same as CF_DISPLAY_LOCK.

display_cmd

Same as CF_DISPLAY_CMD.

base_path

Same as CF_BASE_PATH.

connector_path

Same as CF_CONNECTOR_PATH.

repo_auth_url

Same as CF_REPO_AUTH_URL.

repo_fetch_url

Same as CF_REPO_FETCH_URL.

url_prefix

Specifies the HTTP path for the Web Service. By default it is connector. If a HTTP request does not use the prefix given, the cf module will pass the request to the next module in chain of modules defined by the Metaproxy configuration.

z39.50

Controls the Z39.50 server interface of the module. This element takes one attribute, enable which has values "false" to disable the Z39.50 server (default) or "true" to enable the Z39.50 server.

EXAMPLES

Below is shown a small Metaproxy configuration file which loads the CF Web service module:

<?xml version="1.0"?>
<metaproxy xmlns="http://indexdata.com/metaproxy" version="1.0">
  <dlpath>.</dlpath>
  <start route="start"/>
  <filters>
    <filter id="frontend" type="frontend_net">
      <port>@:9000</port>
      <threads>50</threads>
    </filter>
  </filters>
  <routes>
    <route id="start">
      <filter refid="frontend"/>
      <filter type="log"><category user-access="true" apdu="true" /></filter>
      <filter type="cf">
        <env
           app_path="/var/cache/cf"
           module_path="/usr/share/cf/modules"
           display=":1.0"
           tmpdir="/tmp/cfengine"
        />
       <url_prefix>connector</url_prefix>
      </filter>
      <filter type="bounce"/>
    </route>
  </routes>
</metaproxy>
    

The dlpath must be set to the directory containing Metaproxy modules - in particular the CF module metaproxy_filter_cf.so.

TESTING WITH CURL

#!/bin/sh
C=/usr/share/cf/connectors/inactive/doaj.cf
if test "$1"; then
	C=$1
fi
H=http://localhost:9070/connector
# Create session (empty content)
curl --output ws.log --data-binary "" $H
# Parse it
ID=`cat ws.log | cut -d":" -f 2|cut -d"}" -f 1`

# Load connector file
curl --data-binary @$C $H/$ID/load_cf

# Run a set of tests
curl --header "Content-Type: application/json; charset=UTF-8" --data-binary "{}" \
	$H/$ID/run_tests/search,parse,next,parse
# Run task search
curl --header "Content-Type: application/json" \
	--data-binary "{\"keyword\":\"water\"}" \
	$H/$ID/run_task/search
# Run task parse
curl --header "Content-Type: application/json" --data-binary "{}" \
	$H/$ID/run_task/parse
# Take screen shot (requires pnmtopng, xwdtopnm)
if test -x /usr/bin/pnmtopng; then
	curl --output screen.png \
    	--data-binary "{}" 	$H/$ID/screen_shot
fi
# Run opt task init
curl --output init.log --header "Content-Type: application/json" --data-binary "{}" \
	$H/$ID/run_task_opt/init
# Get log
curl --header "Content-Type: application/json" --data-binary "{}" \
	$H/$ID/log
# Get dom
curl --header "Content-Type: text/html" --data-binary "{}" \
	$H/$ID/dom_string
# Delete the connector
curl --request DELETE $H/$ID


   

FILES

/usr/lib/cf/metaproxy_filter_cf.so

/usr/share/cf/metaproxy/cf.xml

SEE ALSO

cfrun(1) metaproxy(1)