Tesseract-OCR - open source OCR engine logo

Tesseract-OCR - open source OCR engine Tesseract-OCR community

Use this command to install Tesseract-OCR - open source OCR engine:
winget install --id=UB-Mannheim.TesseractOCR -e

Tesseract Open Source OCR Engine.

Tesseract-OCR is an open-source OCR (optical character recognition) engine designed to extract text from images or scanned documents.

Key Features:

  • Supports multiple languages, including historic German scripts like Fraktur.
  • Offers specialized models for improved accuracy with historical print materials.
  • Available as a Windows installer for easy setup and use in environments where Linux is not feasible.
  • Provides ALTO (Advanced Layout Text Object) output format for structured text extraction.
  • Regular updates to improve performance and compatibility with modern and historic documents.

Audience & Benefit:
Ideal for researchers, libraries, archives, and anyone working with historical or printed materials. Tesseract-OCR enables accurate text recognition from scanned documents, facilitating digitization, transcription, and long-term preservation of valuable content. It can be installed via winget for seamless integration into workflows.

Versions
v5.3.0.20221214
v5.2.0.20220712
v5.2.0.20220708
v5.1.0.20220510
v5.0.1.20220118
5.4.0.20240606
5.3.3.20231005
5.3.1.20230401
Website
License