Calculation of Document Similarity using Cellular Structured Space Template

P. Kanongchaiyos (Thailand)


Document similarity, Cellular structured space, Document retrieval


Calculation of similarity between corresponding documents becomes a major task in information retrieval from a textual database (e.g., electronic books or electronic dictionaries). The comparison between documents can be conducted by constructing associative feature vectors or set of terms and computing distance between the corresponding vectors or sets. While Boolean distance seems not practical and set similarity cannot handle with the case that some terms are more effective in retrieval than others, statistics of terms in documents is recognized as a good for computing document relevance. However, the efficiency of the calculation is based on only the size of the statistical data while the documents discourse or additional meaning from the structure of text is not considered. In this research, cellular structured space templates are used for building input documents. The concept of the cellular structured space template for specifying the basic layout and semantics of the document is a reasonable compromised between time-consuming manual document retyping process and unavailable totally automated document recognition process. Semantics based similarity between documents is computed attached calculation of cellular structured vectors which are n dimensional context vectors of the documents. The experimental result shows the improvement of similarity between relevance documents compared with the normal retrieval methods.

Important Links:

Go Back