Robust text line extraction from document images is vital prerequisite for any successful text recognition or analyzing process. Generally, most of the proposed algorithms for this task assumed kind of binarization pre-processing step in order to insure well performance. In this paper, we present a novel robust and efficient algorithm to extract textlines directly from gray level document images.
The algorithm tracks minimal energy sub-seams accumulated to perform a full local minimal/maximal separating and medial seams defining the text lines. To improve the ability of extracting such seams, we enhance the image using double-sided adaptive local density projection profile followed by multi-scale anisotropic second derivative of Gaussian filter bank. Following the observation that center of lines are more reliable to follow, we first extract seams that follow the center of lines to constraint the algorithm for evolving the separating seams. The algorithm is parameter-free and we evaluate the free parameters directly by analyzing the image properties and the pixels distribution. We have tested our approach on multi-lingual various datasets written at range of image quality and received very encouraging results, which outperform state-of-the-art algorithms.
|Saabni, R. (2018, September). Robust and Efficient Text: Line Extraction by Local Minimal Sub-Seams. In Proceedings of the 2nd International Symposium on Computer Science and Intelligent Control (pp. 1-6).|