Document Classifications based on Word Semantic Hierarchies

X. Peng and B. Choi (USA)


: Classification, WordNet, Semantic Web, Document Representation, Information Retrieval


In this paper we proposed to automatically classify documents based on the meanings of words and the relationships between groups of meanings or concepts. Our proposed classification algorithm builds on the word structures provided by WordNet, which not only arranges words into groups of synonyms, called Synsets, but also arranges the Synsets into hierarchies representing the relationships between concepts. Most existing methods classify text documents based on the number of occurrences of words and some based on Synsets. Our approach goes one step further by using not only word occurrences and Synsets but also the relationships between Synsets. We also proposed a sense-based document representation based on the semantic hierarchies provided by WordNet. To classify a document, our approach extracts words occurred in the document and uses them to increase the weight of the Synsets corresponding to the words. Words with same meanings will increase the weight of their corresponding Synsets. As a result, we count the occurrences of senses. We also propagate the weight of a Synset upward to its related Synsets in the hierarchies and thus capture the relationships between concepts. In comparing to previous research, our approach increases the classification accuracy by 14%.

Important Links:

Go Back