Line Extraction for Multi-level Language Document Image

I. Methasate (Thailand)


Line Extraction, minimum spanning tree, Multi-level Language, Document Image


This paper describes the modified minimum spanning tree (MST) technique for extracting the text line in the multi level sentence structure document. This is a rough classi fication based on the character level. The key idea of the technique is to find the characters in main level and try ing to reduce the effect of small characters in other levels. The technique is divided into 6 steps. First, the boundaries of the object are detected and are filtered the small objects out. Second, the tree are created and estimated the angle of the document. Third, calculate cost value of all branches and reduce the tree with MST technique. Fourth, remove unexpected branch. Fifth, find the level boundaries of each sentence and classify the level of each object. Finally, re cover the small font text line. Our experiments include 150 document images, that are from various types of document images and some special cases for testing the robustness. Furthermore, the proposed technique can also be applied to handwritten documents.

Important Links:

Go Back