Parsing on Ungrammaticality

K.K. Yong and C. Huyck (UK)


Ungrammatical, robust parsing, natural language processing


Robust natural language processing is needed to parse and understand human language. Typical natural language presents a high rate of ungrammatical text, so the study of ungrammaticalities cannot be ignored. This paper investigates utterances that deviate from typical linguistic standards. These include omitted words, interjections, repeating words, agreement violations, multiple phrases without proper segmentation, out of order constituents and unrecognised words. We argue that success in handling ungrammatical text depends on identifying and regulating various ungrammatical phenomena. We examined the Christine corpus for ungrammaticalities. A number of ungrammaticalities were found in a portion of utterances from this corpus. By our calculations, 85% of the spoken sentences are ungrammatical. We first manually categorised the utterances into the types of ungrammatical errors. Then, we present an evaluation of the Plink parser on the selected utterances. It contains 61%/64% precision and recall.

