Using OpenMP Directives to Accelerate OCR with Tesseract OCR
Keywords:OCR, Tesseract, multithreading, optical character recognition, parallelism
This paper is devoted the methods of speed-up optical character recognition which is used for transformation of the scanned image to the edited text format. The example of application of these methods are the systems of the automated search of fragment of text in the catalogues of electronic libraries, where as an entrance format both the entered text and vocal query or scanned fragment of the text document can be used. The paper shows that the quality of the original image, as well as the applied image preprocessing algorithms, has the greatest influence on the quality of text recognition. Today the task of text recognition is implemented in many libraries. An example is the Tesseract OCR, considered in the work. It is shown that the joint use of the standard parallel programming library OpenMP, which is built into all modern C and C ++ compilers, reduces the time of processing up to 33% compared to the sequential implementation.
L. Vincent, “Announcing Tesseract OCR.” [Online]. Available: http://googlecode.blogspot.com/2006/08/announcing-tesseract-ocr.html
R. Smith, “An Overview of the Tesseract OCR Engine,” in Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02, ser. ICDAR ’07, 2007, pp. 629–633.
Olesia Barkovska, Oleg Mikhal , Daria Pyvovarova , Oleksii Liashenko , Vladyslav Diachenko and Maxim Volk, Local Concurrency in Text Block Search Tasks, International Journal of Emerging Trends in Engineering Research. - Volume 8. No. 3, March 2020. – P.690-694.