Tesseract icon

Tesseract

Recognize text in images

Overview

This node performs Optical Character Recognition (OCR) on images using the Tesseract OCR engine. It can extract plain text or detailed bounding boxes of text elements from images, supporting various levels of granularity such as paragraphs, lines, words, or characters.

Common scenarios include:

  • Extracting text content from scanned documents or photos.
  • Analyzing invoices, receipts, or forms to digitize printed information.
  • Detecting and extracting specific text regions within an image by specifying coordinates.
  • Obtaining bounding box data for text layout analysis or further processing.

Practical examples:

  • Automatically reading and storing text from uploaded PDFs or images in a workflow.
  • Extracting invoice numbers or dates from receipt images by focusing OCR on a specific box.
  • Generating structured data about text positions for document layout understanding.

Properties

Name Meaning
Detect on Entire Image? Whether to perform OCR on the entire image or only on a specified rectangular box.
Top Y The top Y coordinate of the box to perform OCR on (shown only if "Detect on Entire Image?" is false).
Left X The left X coordinate of the box to perform OCR on (shown only if "Detect on Entire Image?" is false).
Width The width of the box to perform OCR on (shown only if "Detect on Entire Image?" is false).
Height The height of the box to perform OCR on (shown only if "Detect on Entire Image?" is false).
Options Collection of additional OCR options:
- Language Language code for OCR (e.g., "eng" for English). See Tesseract language codes.
- Page Segmentation Mode (PSM) Defines how Tesseract segments the page, e.g., single block, single column, single line, single word, or sparse text. See PSM explanation.
- Resolution Optionally force a specific DPI resolution for the image instead of autodetection.
- Character Lists Configure character whitelist or blacklist to restrict recognized characters. For example, allow only certain letters or disallow some characters.
Timeout Maximum time in milliseconds to wait for OCR processing per image before canceling.

Output

The node outputs an array of items where each item contains a json field with the OCR results:

  • For Extract Text operation (ocr):
    The json contains the recognized plain text extracted from the image or specified box.

  • For Extract Boxes operation (boxes):
    The json includes detailed bounding box information at the chosen granularity level (paragraphs, lines, words, or characters), describing the position and content of each detected text element.

If the OCR process times out on an item, the output JSON will include a timeout flag set to true.

The node does not output binary data.

Dependencies

  • Uses the tesseract.js library for OCR functionality.
  • No external API keys or services are required; OCR is performed locally via the bundled Tesseract engine.
  • Requires appropriate language data files for the selected OCR language.
  • No special environment variables or n8n credentials are needed.

Troubleshooting

  • Timeout Errors: If OCR takes longer than the specified timeout, the node throws a timeout error. Increase the timeout value or reduce image size/complexity.
  • Incorrect Text Recognition: Ensure the correct language code is selected and that the image quality is sufficient.
  • Character Whitelist/Blacklist Misconfiguration: Using conflicting or overly restrictive character lists may cause missing or incorrect text. Adjust these settings carefully.
  • Box Coordinates Out of Bounds: When performing OCR on a box, ensure the coordinates and dimensions are within the image bounds to avoid errors.
  • Page Segmentation Mode Mismatch: Choosing an inappropriate PSM for the image layout may reduce accuracy. Refer to the linked documentation to select the best mode.

Links and References

Discussion