PERO OCR

This application demonstrates capabilities of pero-ocr python package developed in project PERO at Brno University of Technology. You can create an account and use the application to transcribe documents. To get an idea how well the OCR works, you can view Public documents. If you have any questions or if you experience any problems, contact Michal Hradiš.

You can watch videos demonstrating document management, page layout editing and text recognition, correction and review.

The application allows users to automatically transcribe several types of printed and handwritten documents. The provided OCR engines are able to transcribe even very low-quality printed documents in most european languages including Latin, old documents in Fraktur and similar scripts in German and Czech, and handwritten documents mainly in Czech language. The application provides efficient interface for text corrections and several formats of transcriptions for download (ALTO, PAGE XML, plain text).

The application is free for personal use and any use which involves manual text correction. If possible, please give credit to the authors of PERO, for example by citing these publications:

O Kodym, M Hradiš: Page Layout Analysis System for Unconstrained Historic Documents. ICDAR, 2021.
M Kišš, K Beneš, M Hradiš: AT-ST: Self-Training Adaptation Strategy for OCR in Domains with Limited Transcriptions. ICDAR, 2021.
J Kohút, M Hradiš: TS-Net: OCR Trained to Switch Between Text Transcription Styles. ICDAR, 2021.

If you need to process large volumes of documents, you can use our REST API which can be easily integrated into digitization pipelines. Have a look at API documentation or request a testing API key by contacting Michal Hradiš.

Uploaded images and their corrections will not be shared with anyone outside Brno University of Technology, and we have reasonable technical measures and internal rules in place to prevent any data leaks. Be aware that the images and corrections you provide may be used for further training and improvement of our systems.

PERO OCR demonstration application