Recognize Person Names from Chinese Texts based on Clustering SVM

L. Li, Z. Ding, and D. Huang (PRC)


Recognition of Chinese person name, entity name, clustering SVM, machine learning


This paper presents a method of recognizing person names from Chinese texts based on clustering Support Vector Machine (SVM). The character itself, character based part-of-speech (POS) tag, the information whether a character is a surname, the frequency of a character in person names table and context information are extracted as the features of the vectors in SVM algorithm. A training set is established. However, there exists imbalance between two class samples in practical training sets, so the training set was clustered using the kernel k means clustering algorithm. The experimental results show that the model of recognizing Chinese person names based on clustering SVM is more efficient than the original one without clustering. The model can also be used for recognizing other named entity such as location names and organization names and can be generalized to the fields of machine learning with unbalanced class distribution.

Important Links:

Go Back