Using OpenMP Directives to Accelerate OCR with Tesseract OCR

Olesia  Barkovska; Ihor  Ryzhov

doi:10.30837/csitic52021232891

Authors

Olesia Barkovska Kharkiv National University of Radio Electronics, Ukraine
Ihor Ryzhov Kharkiv National University of Radio Electronics, Ukraine

DOI:

https://doi.org/10.30837/csitic52021232891

Keywords:

OCR, Tesseract, multithreading, optical character recognition, parallelism

Abstract

This paper is devoted the methods of speed-up optical character recognition which is used for transformation of the scanned image to the edited text format. The example of application of these methods are the systems of the automated search of fragment of text in the catalogues of electronic libraries, where as an entrance format both the entered text and vocal query or scanned fragment of the text document can be used. The paper shows that the quality of the original image, as well as the applied image preprocessing algorithms, has the greatest influence on the quality of text recognition. Today the task of text recognition is implemented in many libraries. An example is the Tesseract OCR, considered in the work. It is shown that the joint use of the standard parallel programming library OpenMP, which is built into all modern C and C ++ compilers, reduces the time of processing up to 33% compared to the sequential implementation.

References

L. Vincent, “Announcing Tesseract OCR.” [Online]. Available: http://googlecode.blogspot.com/2006/08/announcing-tesseract-ocr.html

R. Smith, “An Overview of the Tesseract OCR Engine,” in Proceedings of the Ninth International Conference on Document Analysis and Recognition - Volume 02, ser. ICDAR ’07, 2007, pp. 629–633.

Olesia Barkovska, Oleg Mikhal , Daria Pyvovarova , Oleksii Liashenko , Vladyslav Diachenko and Maxim Volk, Local Concurrency in Text Block Search Tasks, International Journal of Emerging Trends in Engineering Research. - Volume 8. No. 3, March 2020. – P.690-694.

Using OpenMP Directives to Accelerate OCR with Tesseract OCR

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

Information

Developed By