
We’re all familiar with the difficulty of finding relevant information inside huge sets of search results. The sheer scale of many information resources forces us to iteratively refine and adapt our search queries until either we find the information we need or we abandon our search.
Using taxonomies, thesauri or ontologies to tag our information resources allows us to help users find information more quickly. This in turn leads to increased usage, driving renewals and additional sales of information at the point of discovery. Abandoned searches are clearly a failure in this context!
Here are the four most important techniques for improving search by leveraging taxonomies, thesauri and ontologies.
Drill down
Alongside a set of search results a search engine can provide a series of drill down categories which allow the user to refine their query and cut down the result set until they find the information they need. If properly structured faceted taxonomies have been used to tag the search documents then the terms from these taxonomies can be used to provide the drill-down categories for the search engine.
A well designed interface will let the user select and remove multiple terms from each of the facets incrementally, providing visual feedback at each step and letting the user see the effects of their actions in real time. This helps the user avoid the possibility of getting zero matching results.
Search suggestions
If a user enters a mis-spelled term in their search query it is a particularly useful strategy to suggest an alternative term. This is commonly done by analysing the full text of all documents indexed by a search engine to provide a best guess spelling suggestion based on statistical word frequencies. However, if a thesaurus is available then the search engine can also present suggestions to the user based on preferred terms from the thesaurus. For example, if the user searched for weasels in a life sciences database, but the database had been compiled using a controlled vocabulary which specified that this word was not used and the word mustela was used instead then the user might find zero results. By using the thesaurus the search engine can suggest the correct term (i.e. mustela) to the user.
Term expansion
By using the relationships within the thesaurus, a search engine can automatically expand the terms a user enters to include other highly relevant terms. Following the example above, this could be used to add mink, polecats and ferrets to the search query automatically, but to add less weight to these terms so that any direct matches on mustela would be ranked first within the results.
Semantic results
Many search queries are the expression of a desire for a particular piece of factual information. This information is often hidden within the full text of a document which the user must first find then read to discover the relevant facts. But, by tagging documents using carefully controlled ontologies it’s quite possible to present the facts on their own within a larger set of search results. This approach is already being used by Google and Bing, whilst Wolfram Alpha does away with documents altogether in search results and just presents factual data straight back to the user as a single result page.
It’s about business, not technology …
Used together these four techniques deliver relevant information that your end users need. At the point of delivery a well designed e-commerce and access management system enables them to pay for it. If users cannot find what they need then renewals and pay per view sales will clearly suffer. Integrating taxonomies with search is therefore at heart a business issue, because it allows you to deliver relevant content to customers at exactly the right time – when they are looking for it.

An alternative approach for Search Suggestions & Corrections is to mine search queries to see actual searchers’ mistakes & corrections! Google does it.
Yes, building dynamic learning from users queries is a powerful technique. In fact Google recently found that Microsoft were logging queries from IE users searching on Google. They were using this data to build search suggestions in Bing!