Word <em>N</em>-Grams for Polish

B. Ziółko; D. Skurzok; M. Ziółko

doi:10.2316/P.2010.674-079

Create New Account
Login

Search or Buy Articles
Browse Journals
Browse Proceedings
Submit your Paper
Submission Information
Journal Review
Recommend to Your Library
Call for Papers

Word N-Grams for Polish

B. Ziółko, D. Skurzok, and M. Ziółko (Poland)

Keywords

Polish, n-grams, speech recognition, language modelling

Abstract

The large collection of word n-gram statistics for Polish is described. Some details of the text analysis algorithm supporting processing data on computer clusters is presented as well. The corpora of total size of 267 030 267 words were used. The encountered problems due to the special Polish characters are described as well as the impact of rich morphology in Polish on this type of statistics. The most common n-grams are presented and commented. This is the ﬁrst publication of such statistics of Polish.

Important Links:

DOI: 10.2316/P.2010.674-079
From Proceeding (674) Artificial Intelligence and Applications - 2010

Go Back