Document Clustering based on Similarity of Subjects using Integrated Subject Graph

M. Nakada and Y. Osana (Japan)


Document Clustering, k-means Method, Subject Graph, In tegrated Subject Graph


In this research, we propose an integrated subject graph which expresses the subject of the document. The proposed integrated subject graph is based on the graph-based text representation model which is called “subject graph”. In the subject graph, a node represents a term in the text, and an edge denotes a relation between linked terms. As the conventional text representation models, the graph models such as the subject graph and the KeyGraph have been pro posed, and most of them assume that one document has one subject. However, the document often has not only one subject but also plural subjects. In this research, we as sume that each unit of the document such as a paragraph has one subject, and each unit is translated into a subject graph. Then, they are integrated into an integrated subject graph. In this research, we apply the proposed integrated subject graph to the document clustering and realize the document clustering based on the similarity of the subjects. We carried out a series of computer experiments and con firmed the effectiveness of the proposed integrated subject graph.

Important Links:

Go Back