Dimensionality Reduction of Features for Text Categorization

P. Jitpakdee and W. Kreesuradej (Thailand)


Data mining, text mining, text categorization


This paper proposes a new technique for dimensionality reduction of features for text categorization. Unlike conventional method, our phrase features are generated based on word sequences of different length (Multi grams) from phrases extracted from whole documents. Then, we utilize Odds ratio (OR) to perform phase feature selection. From preliminary experiments, the proposed techniques show better performance than that of conventional methods.

Important Links:

Go Back