Improving the Readability of Class Lecture Automatic Speech Recognition Results using Multiple Hypotheses

Y. Fujii, K. Yamamoto, and S. Nakagawa (Japan)


improving readability, confusion network, automatic speech recognition, classroom lecture speech


This paper presents a method for improving the readability of class lecture Automatic Speech Recognition (ASR) results, which hitherto have been difficult for humans to understand, even in the absence of recognition errors. This is because the speech in a class lecture is relatively casual and contains many ill-formed utterances with filled pauses, restarts, and so on. Recently there has been extensive research on paraphrasing and correcting recognition results. However, research on improving the readability of recognition results has focused mainly on manually transcribed texts, but not ASR results. Due to the presence of many kinds of specific words and the casual style, even state-of the-art recognizers can only achieve a 30-50% word error rate (WER) for the speech in class lectures. In this paper, we propose a novel method that utilizes multiple hy potheses of the ASR results to improve readability of the recognition results. Experimental results show the pro posed method resemble the manually paraphrased text the most and subjective test show the proposed method improve the readability of the ASR results under erroneous conditions where WER is high and 37.7%.

Important Links:

Go Back