Tesseract-OCR - open source OCR engine Tesseract-OCR community
Use this command to install Tesseract-OCR - open source OCR engine:
winget install --id=UB-Mannheim.TesseractOCR -e
Tesseract Open Source OCR Engine.
Tesseract-OCR is an open-source OCR (optical character recognition) engine designed to extract text from images or scanned documents.
Key Features:
- Supports multiple languages, including historic German scripts like Fraktur.
- Offers specialized models for improved accuracy with historical print materials.
- Available as a Windows installer for easy setup and use in environments where Linux is not feasible.
- Provides ALTO (Advanced Layout Text Object) output format for structured text extraction.
- Regular updates to improve performance and compatibility with modern and historic documents.
Audience & Benefit:
Ideal for researchers, libraries, archives, and anyone working with historical or printed materials. Tesseract-OCR enables accurate text recognition from scanned documents, facilitating digitization, transcription, and long-term preservation of valuable content. It can be installed via winget for seamless integration into workflows.
Versions
v5.3.0.20221214
v5.2.0.20220712
v5.2.0.20220708
v5.1.0.20220510
v5.0.1.20220118
5.4.0.20240606
5.3.3.20231005
5.3.1.20230401
Website