Although the Connector Platform is intended to work with websites, it can also be used with XML-based web services. With a few caveats, those ought to work the same way as a regular web site.
Usually it makes sense to build the complete URL to the web service, instead of going through an input page and setting form values. Simple keyword searches can be done the usual way, but this is where the fullquery step really shines, especially if the site supports a more complex query language.
On the goto-URL step, check the box “Load raw XML”. This makes the step load the XML into an internal buffer, instead of the browser window, so that the browser will not do its own formatting. It will also display the XML in the browser window, in a format we can work with.
Build the parse task as usual. You can use extract_Xpattern, and even create a pattern by clicking on the elements you want. Remember to set the hit area to something that exists in the result document, possibly the root node, /searchresult, or what ever it is called.
For the Next task, you may need to modify the previously-used URL, possibly adding a start-record-number argument. In some cases the resulting XML contains a link to the next page, which makes life easier.
When you need to define the hit-area for XPattern, or use the Extract-Value step, you need to give an XPath to point to the node. Normally this can be done by pointing and clicking, but in case not, remember that
- XPath is case sensitive when working with XML documents
- XPath may need some namespace declarations
Some details about the XPatterns
- Unlike the XPath, the XPattern is not case sensitive.
- The XPattern does not care about namespaces at all. It just uses the tag names and ignores all namespace prefixes.
It is theoretically possible that these restrictions make some XML formats harder to parse. In practice it is extremely rare to meet XML that has identical tags that only differ by namespaces, or by upper/lowercase letters. If you run into such, you may need to extend your XPattern to match some nearby nodes.