Text Extraction from Images in the Wild Using the Viola-Jones Algorithm

 Text Localization and extraction is an important issue in modern applications of computer vision. Applications such as reading and translating texts in the wild or from videos are among the many applications that can benefit the results of this field. In this work, we adopt the well-known Viola-Jones algorithm to enable text extraction and localization from images in the wild. The Viola-Jones is an efficient, and fast image-processing algorithm originally used for face detection. Based on some resemblance between text and face detection tasks in the wild, we have modified the viola-jones to detect regions of interest where text may be localized. In the proposed approach, some modifications to the HAAR-like features and a semi-automatic process of data set generating and manipulation were presented to train the algorithm. Processes of sliding windows with different sizes have been used to scan the image for individual letters and letter clusters’ existence. A post-processing step is used in order to combine the detected letters into words and to remove false positives. The novelty of the presented approach is using the strengths of a modified Viola-Jones algorithm to identify many different objects representing different letters and clusters of similar letters and later combine them into words of varying lengths. Impressive results were obtained on the ICDAR contest data sets.

