J.-H. Kim, S.C. Hwang, S. Park, and K.-T. Kim (Korea)
Data Mining, Keyword Frequency, TFIDF, Conceptual Knowledge
An algorithm for classifying documents through a
keyword extractor is introduced in this study. The system
consists of a document collector, indexer and a document
classifier. The conceptual knowledge of the category to be
classified is required for classification. The web document
collector collects web documents from web directories of
internet portal sites and the title, hyperlink and text data
are abstracted from these documents to be saved in files.
The conceptual knowledge is constructed by applying a
method that combines the keyword term-frequency
method and TFIDF algorithm through the indexer.
Finally, the document classifier applies the classification
algorithm and the conceptual knowledge on the
documents to be classified for classifying the documents.