Fullquery
Fullquery
Some web sites support a more complex query language instead of, or in addition to, filling values in input fields. For such cases we have the fullquery step.
When making a search task with fullquery, no regular search arguments (keyword, author, title, etc) should be used. The fullquery argument contains all there is to know about the search. Unlike other arguments that are simple strings, the fullquery is a complex structure that reflects the query possibilities in Z39.50. The gory details are explained below, but here is a simple example (formatted for readability)
"s1" : { "term" : "king lear",
"structure": "phrase",
"field": "title" } ,
"s2" : { "term" : "shakesp",
"truncation": "right",
"field": "author" }
}
Configuring the step
The step configuration consists of five tabs.
General tab
Here you specify which parameter to transform from - almost always ‘fullquery’, and where to put the resulting query string, most often a temporary variable you will be using later.
Here is also a pull-down where you have to choose from a predefined set of starting points. These set up decent defaults for all the other tabs, so hopefully you don’t have to change too much later. Once set, this can not be changed. If you got it wrong, delete the step and start with a new one. The starting points are
- CCL
- Library of Congress
- ALEPH
The checkbox ‘Fail if no query to begin with’ should usually be checked for simple fullquery connectors. In more advanced connectors that may invoke the fullquery step several times, it may be better to leave it unchecked.
Fields tab
Here you specify what search indexes are supported by the website, and how they are expressed. For example, a title search could be expressed as something like ti=hamlet. So you specify title for the field, and ti= for the string. There are buttons to delete unsupported fields, and one to add new ones.
If you need more complex structure than a simple prefix, you can use the magic XX marker to indicate the term, as in (ti=XX).
The last element in the list is always the (unspecified), which is the default string to use when there is no field specified in the query.
There are also two checkboxes. The first one causes the step to fail if it meets a field name not listed in this tab. In new connectors, this should (almost?) always be checked. The other checkbox causes the step to fail, if the query contains a term that has no field defined. This is perfectly valid, and should normally not be checked. But it can be useful to catch some errors in advanced connectors.
Operators tab
Here you specify the words used to indicate the various operators and, or, not, and the two proximity operators (ordered and unordered). Each of them has a checkbox where you tell if the operator is at all supported.
There are two ways to specify the strings. The simple one is just to write the word used for the operator, for example and. In some cases you need to put brackets around the whole thing, or do other stuff. Then you can write things like ( XX and YY ). The XX and YY will be replaced by the left and right operands, respectively.
For the proximity operators, the magic string %DIST% gets replaced by the distance limit in the incoming query.
Pay attention to white space! Some system require a space before or after the magic word.
Terms tab
The left half of this tab is about quotes. Most websites want some terms quoted, and some not. Here you can specify what kind of quotes to use for different terms
- word term: If the query has specified that this is a simgle word. Often you don’t need any quotes.
- phrase term: If the query has specified that this is a phrase. Double quotes are a common choice.
- default term: if the query has not specified anything (as is often the case). No quotes is common.
For each of these you have a pull-down where you can choose between
- none
- custom. Enter the beginning and ending quotes in the input fields after the pulldown
- double quotes
- single quotes
- parentheses
- square brackets
- (Not supported) - causes the step to fail if the query contains this kind of structure. Usually only used to indicate that phrase searches are not supported.
Note that the custom style allows for more than one character in the quoting, should you ever need it.
The right half of the tab is all about truncation. It specifies what kind of truncation is supported, and how it is expressed in the query string.
Ranges tab
There are cases when we need to search by (numerical) ranges, most often years. The fullquery can handle those too (as of version 2.10). The Ranges tab specifies what to do with them.
There is an input to specify which fields are supposed to support ranges. Most likely that will be ‘year’, but one might imagine that we could meet other uses. If more than one field is needed, separate them by spaces.
Then come three inputs to define how the ranges are to be specified, if both ends are there, or only one end is available. Here you can again use the XX and YY strings to represent the values.
The fullquery step can only handle queries with one year range in them. It fails with an error if there are more than two endpoints, or the ends are in wrong order, etc.
Examples tab
In this tab you can test how your fullquery step will behave when it sees some predefined test queries. Just click on any of the descriptive links on the bottom of the tab, and a corresponging JSON string appears in the input on top. This is then translated according to the settings you have specified, and displayed underneath.
Using fullquery
The obvious way to use the resulting query string is to pluck it into an input field, probably on some sort of advanced search page.
Another, more effective way is to use the transform step to put the query string directly into the URL and go straight into the results page. That way we don’t need to spend time fetching the search page, filling values, and submitting it.
For more advanced use of the fullquery step, note that it does not need to start with the original fullquery parameter. You can first apply a fullquerylimit step to remove some parts of the query, the a listquery step to convert the query to a simple list, then a listqueryelement to extract all terms that refer to (say) titles, and then use the fullquery step to transform those terms into a nice search string. In such case, you may not need to specify fields at all, since you already know that everything is about titles. But you may want to specify operators, truncation, and quoting, as usual.
Structure of the fullquery parameter
The fullquery parameter consists of a tree, built of two kinds of elements.
Operator node
consists of
opThe operator, one ofand,or, ornot(which means ‘and not’), orproxfor proximity searches (see below)s1The left operand. This can be anotheropnode, or atermnodes2The right operand
Proximity searches have additional parameters. Many of them are not supported by the fullquery step (yet?), but are included so that we can later support the full Z39.50 standard.
distanceMaximum distance between the two termsorderedTrue or falseexclusionif present, must always befalserelationif present, must always be `le’ for less than equal.unitif present, must always beword
Term node
A term node is more complex. It contains some selection of
termThe search term. Must always be there.fieldThe search index. Can be anything, most oftenauthor title keywordetcrelationOne oflt le eq gt ge ne phonetic stem relevance alwaysmatches.positionOne offirstinfield firstinsubfield anystructureOne of a longer list of alternatives, most oftenword phrase year stringtruncationOne ofright left both(or some other fancy value)completenessOne ofincompletesubfield completesubfield completefield
Note that the fullquery parameter can reflect anything that can come in a Z39.50 query. The fullquery step can not (yet?) handle all possibilities. Position, and completeness are not used by the ‘fullquery’ step at all, and structure only affects the kind of quotes to put around the term. Relation (other than ‘eq’ which is the default) is only reocgnized when dealing with ranges. Since the Z39.50 query system does not support ranges on its own, they are represented as something like (year >= 2000 AND year <=2009). The fullquery step tries to be clever in finding those ranges, but there may be queries where that is not always possible.
Listquery node
The listquery step will produce a structure that is quite like a fullquery. The differences are:
- It is a flat list, or technically, an array of term nodes. There are no subtrees, so no s1 or s2 elements.
- Each node is a term node, with the addition of an ‘op’ element. In the all first node this is empty, in the subsequent ones, it specifies the operator to appply between this node, and the preceding ones.
The fullquery step can accept a listquery-type parameter too, and do the right thing with it. In that case, you obviously can not use the (XX and YY) form on the operator tab, as that implies nested tree.
Examples
- { “term” : “foo” }
- { “term” : “foo”, “field”: “title” }
- { “op” : “and” , “s1” : { “term” : “hamlet”, “field”: “title” } , “s2” : { “term” : “shakespeare”, “field”: “author” } }
- { “term” : “shakesp”, “truncation”: “right”, “field”: “author” }
- { “term” : “king lear”, “structure”: “phrase”, “field”: “title” }
- { “op”:”prox”, “distance”:3, “ordered”:false, “s1”:{“term”:”dylan”}, “s2”:{“term”:”zimmerman”}}
- { “op”:”prox”, “exclusion”:false, “distance”:3, “ordered”:true, “relation”:”le”, “unit”:”word”, “s1”:{“term”:”dylan”}, “s2”:{“term”:”zimmerman”} }
- { “op”:”and”, “s1”: { “term”:”shakespeare”, “field”:”author” }, “s2”: { “op”:”and”, “s1”: { “term”:”2000”, “field”:”year”, “relation”:”ge” }, “s2”: { “term”:”2009”, “field”:”year”, “relation”:”le” } } }
