Install Tesseract-OCR - open source OCR engine using Winget - wingetCollections

Tesseract-OCR - open source OCR engine Tesseract-OCR community

hacktoberfest lstm machine-learning ocr ocr-engine tesseract tesseract-ocr

Use this command to install Tesseract-OCR - open source OCR engine:

winget install --id=tesseract-ocr.tesseract -e

Tesseract-OCR is an open-source optical character recognition (OCR) engine designed to extract text from images with high accuracy. It supports over 100 languages and various image formats, including PNG, JPEG, and TIFF, making it versatile for different use cases.

Key Features:

Supports multiple languages out of the box, enabling OCR in a wide range of scripts.
Handles various image formats, ensuring compatibility with common file types.
Produces outputs in formats like plain text, hOCR (HTML), PDF, TSV, ALTO, and PAGE for flexibility.
Includes a neural network-based LSTM engine for advanced line recognition alongside the legacy Tesseract OCR engine.
Allows users to train the engine for new languages or refine existing models, enhancing adaptability.
Can be installed via winget for easy setup.

Audience & Benefit: Ideal for developers integrating OCR into applications, businesses digitizing documents, and researchers needing customizable OCR solutions. Tesseract-OCR provides a cost-effective and highly adaptable tool for extracting text from images efficiently, supporting both legacy workflows and modern neural network-based approaches.

README

Tesseract OCR

Tesseract OCR

About

This package contains an OCR engine - libtesseract and a command line program - tesseract.

Tesseract 4 adds a new neural net (LSTM) based OCR engine which is focused on line recognition, but also still supports the legacy Tesseract OCR engine of Tesseract 3 which works by recognizing character patterns. Compatibility with Tesseract 3 is enabled by using the Legacy OCR Engine mode (--oem 0). It also needs traineddata files which support the legacy engine, for example those from the repository.

Versions

5.5.0.20241111

Website

github.com

License

Last updated

5/15/2025

Download latest version

tesseract imagename outputbase [-l lang] [--oem ocrenginemode] [--psm pagesegmode] [configfiles...]

The code in this repository is licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

   http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Tesseract-OCR - open source OCR engine Tesseract-OCR community

README

Tesseract OCR

Table of Contents

About

Brief history

Installing Tesseract

Running Tesseract

For developers

Support

License

Dependencies

Latest Version of README