Tesseract

Recognize text in images

Actions2

- Extract Text
- Extract Boxes

Overview

This node performs Optical Character Recognition (OCR) on images using the Tesseract OCR engine. It can extract plain text or detailed bounding boxes of text elements from images, supporting various levels of granularity such as paragraphs, lines, words, or characters.

Common scenarios include:

Extracting text content from scanned documents or photos.
Analyzing invoices, receipts, or forms to digitize printed information.
Detecting and extracting specific text regions within an image by specifying coordinates.
Obtaining bounding box data for text layout analysis or further processing.

Practical examples:

Automatically reading and storing text from uploaded PDFs or images in a workflow.
Extracting invoice numbers or dates from receipt images by focusing OCR on a specific box.
Generating structured data about text positions for document layout understanding.

Properties

Name	Meaning
Detect on Entire Image?	Whether to perform OCR on the entire image or only on a specified rectangular box.
Top Y	The top Y coordinate of the box to perform OCR on (shown only if "Detect on Entire Image?" is false).
Left X	The left X coordinate of the box to perform OCR on (shown only if "Detect on Entire Image?" is false).
Width	The width of the box to perform OCR on (shown only if "Detect on Entire Image?" is false).
Height	The height of the box to perform OCR on (shown only if "Detect on Entire Image?" is false).
Options	Collection of additional OCR options:
- Language	Language code for OCR (e.g., "eng" for English). See Tesseract language codes.
- Page Segmentation Mode (PSM)	Defines how Tesseract segments the page, e.g., single block, single column, single line, single word, or sparse text. See PSM explanation.
- Resolution	Optionally force a specific DPI resolution for the image instead of autodetection.
- Character Lists	Configure character whitelist or blacklist to restrict recognized characters. For example, allow only certain letters or disallow some characters.
Timeout	Maximum time in milliseconds to wait for OCR processing per image before canceling.

Output

The node outputs an array of items where each item contains a json field with the OCR results:

For Extract Text operation (ocr):
The json contains the recognized plain text extracted from the image or specified box.
For Extract Boxes operation (boxes):
The json includes detailed bounding box information at the chosen granularity level (paragraphs, lines, words, or characters), describing the position and content of each detected text element.

If the OCR process times out on an item, the output JSON will include a timeout flag set to true.

The node does not output binary data.

Dependencies

Uses the tesseract.js library for OCR functionality.
No external API keys or services are required; OCR is performed locally via the bundled Tesseract engine.
Requires appropriate language data files for the selected OCR language.
No special environment variables or n8n credentials are needed.

Troubleshooting

Timeout Errors: If OCR takes longer than the specified timeout, the node throws a timeout error. Increase the timeout value or reduce image size/complexity.
Incorrect Text Recognition: Ensure the correct language code is selected and that the image quality is sufficient.
Character Whitelist/Blacklist Misconfiguration: Using conflicting or overly restrictive character lists may cause missing or incorrect text. Adjust these settings carefully.
Box Coordinates Out of Bounds: When performing OCR on a box, ensure the coordinates and dimensions are within the image bounds to avoid errors.
Page Segmentation Mode Mismatch: Choosing an inappropriate PSM for the image layout may reduce accuracy. Refer to the linked documentation to select the best mode.