Revealing Topic-based Relationship Among Documents using Association Rule Mining

K. Sriphaew and T. Theeramunkong (Thailand)


association rule mining, data mining, knowledge dis covery, document, topic-based, relationship.


With a large volume of electronic documents, find ing documents the contents of which are same or simi lar in their topics has recently become a crucial aspect in textual data mining. Towards revealing so-called topic based relationship among the documents, this paper pro poses a method to exploit co-occurring unigrams and bi grams among documents to extract a set of topically simi lar documents with association rule mining techniques. To evaluate effectiveness of the method, a collection of well organized scientific research publications is employed. The experimental result indicates that any two documents with referential links can be found with the accuracy of 60-80% in the case of unigrams, and 80-90% in the case of bigrams. An analysis of discovered association rules is also given.

