Improving Retrieval by a Similarity Thesaurus based on Hyperlink Structure

D. Kukulenz, K.-P. Herget, and J. Pauli (Germany)


Hyperlink-based information retrieval, Web structure min ing


One strategy to enhance the retrieval effectiveness of search engines is to apply automatic query expansion. For this purpose a similarity thesaurus may be applied in order to find new search terms. The similarity thesaurus may be constructed using a model for term comparison. Common methods to define term distances are based on the occur rence frequencies of terms in documents. In this article we develop a new measure for term distances that is based on the hyperlink structure connecting documents. Hyperlinks frequently point to documents that concern similar topics. Based on this assumption in the presented system term dis tances between terms in linked documents are decreased. We apply a search engine based on automatic query ex pansion to evaluate this approach. In the experiments sim ulated hyperlink graphs are applied to show the effect of different hyperlink topologies on the retrieval quality.

