Automatic Chinese Name Recognition based on Web Corpus Analysis

L. Ru, Z. Tong, Y. Liu, and S. Ma (PRC)


Chinese Name Recognition, Statistical Analysis, Unknown Word Recognition.


In this paper, we proposed a unified solution for Chinese name Recognition based analysis into large scale Chinese Web corpuses. In our approach, a Chinese name is identified according to its component, context and structure features. The possibility of a three-character string being a Chinese name is calculated according to statistical analysis into Web corpus which contains over 100 million Web pages and 24 million Chinese names. Experimental results based on a widely-adopted Chinese annotated corpus show that our method is effective by achieving 93% precision and 89% recall rate.

