Documents Clustering based on Frequent Term Sets

W.-L. Liu and X.-S. Zheng (PRC)


Documents Clustering; Vector Space Model; Frequentterm sets


By grouping similar documents into clusters, the search space can be reduced, the search can be accelerated and its precision can be improved. In this paper, the documents clustering algorithm based on frequent term sets is introduced. First, documents are represented according to Vector Space Model (VSM) and every term is sorted according to relative frequency. Then frequent term sets can be found using frequent-pattern growth (FP growth). Finally, documents are clustered based on these frequent term sets. This approach is efficient for very large databases, and provides an understandable description of the discovered clusters by their frequent term sets. Experimental results show the algorithm has an advantage in efficiency and suitability.

Important Links:

Go Back