Evaluating and Comparing Text Clustering Results

L. Massey (Canada)


Text clustering, text categorization, evaluation.


Text clustering is a useful and inexpensive way to organize vast text repositories into meaningful topics categories. However, there is little consensus on which clustering techniques work best and in what circumstances because researchers do not use the same evaluation methodologies and document collections. Furthermore, text clustering offers a low cost alternative to supervised classification, which relies on expensive and difficult to handcraft labeled training data. However, there is no means to compare both approaches and decide which one would be best in a particular situation. In this paper, we propose and experiment with a framework that allows one to effectively compare text clustering results among themselves and with supervised text categorization.

Important Links:

Go Back