Introduction - If you have any usage issues, please Google them yourself
This package contains an OCR engine - libtesseract and a command line program - tesseract.
The lead developer is Ray Smith. The maintainer is Zdenko Podobny. For a list of contributors see AUTHORS and GitHub's log of contributors.
Tesseract has unicode (UTF-8) support, and can recognize more than 100 languages "out of the box".
Tesseract supports various output formats: plain-text, hocr(html), pdf, tsv, invisible-text-only pdf.
You should note that in many cases, in order to get better OCR results, you'll need to improve the quality of the image you are giving Tesseract.
This project does not include a GUI application. If you need one, please see the 3rdParty wiki page.
Tesseract can be trained to recognize other languages. See Tesseract Training for more information.