Document images captured by camera often suffer from warping and distortions because of the bounded volumes and complex environment light source. These effects not only reduce the document readability but also the OCR recognition performance. In this paper, we propose a method to combine non-linear and linear compensation for correcting distortions of document images. First, due to the broken text result of Otsu binarization, an image preprocessing is used to remove the effect of background light. Second, the dewarping method using the cubic polynomial fitting equation is proposed to find out the optimal approximate text line for vertical direction rectification. Finally, we use linear compensation for horizontal direction rectification. Experimental results demonstrate the robustness of the proposed methodology and improve the accuracy rate of OCR recognition.
Research Notes in Information Science (RNIS), pp.459-464