Normalize URL
In a parsing task, the “normalize URL” step automatically adds the website prefix to the URL string extracted from a result set’s HTML. Magically enough, “It just works.”
For example, running search and parse tasks in the NIH Clinical Trials database returns raw HTML data from the “attributes” node of the title section:
href="/ct2/show/NCT00972257?term=betimol&rank=1"
(visible in the results pane in the lower left of the screen after running “Parse by XPattern” in the builder)
By adding a “Normalize URL” step after the “Parse by XPattern” step, the Connector Builder supplements the HTML data with the site’s URL to create a meaningful link:
http://clinicaltrials.gov/ct2/show/NCT00972257?term=betimol&rank=1
Complete URLs are required so that connector users can click through to electronic resources retrieved by queries.
