Introduction of Logic in Language Modelling: The Minimum Perplexity Criterion

D. Bouchaffra

doi:10.2316/Journal.206.2005.3.206-2840

Introduction of Logic in Language Modelling: The Minimum Perplexity Criterion

D. Bouchaffra

Keywords

Logic, statistical language model, maximum likelihood estimation,word n-grams, minimum perplexity

Abstract

Sparse conﬁgurations are inherent to any statistical model that is based on training. Sparse data are those that have not been encountered during the training phase. This problem represents a big challenge to the scientiﬁc community. It is well known that the ML estimator is sensitive to extreme values and is therefore unreliable. To answer this challenge, the author proposes a novel approach based on logic that uses the minimal perplexity criterion. In this approach, conﬁgurations are considered as probabilistic events, such as predicates related through logical connectors. The method is general; it can be applied to any type of data. In this work it is applied to estimate word trigram probability values from a corpus. Experimental results conducted on several test sets show that this logical approach using the minimal perplexity criterion is promising: it outperforms both the absolute discounting and the Good-Turing discounting techniques. It thus represents a signiﬁcant contribution to language modelling.

Important Links:

References
DOI: 10.2316/Journal.206.2005.3.206-2840
From Journal (206) International Journal of Robotics and Automation - 2005

Go Back