Tesseract icon

Tesseract

Recognize text in images

Overview

This node uses the Tesseract OCR engine to recognize text within images. It supports two main operations: extracting plain text from an image and extracting bounding boxes around text elements at various levels of granularity (paragraphs, lines, words, or characters). This node is useful for automating data extraction from scanned documents, photos containing text, invoices, receipts, or any image where textual content needs to be programmatically accessed.

For example, you can use it to:

  • Extract all text from a scanned contract.
  • Identify and locate individual words or lines in an image for further processing.
  • Limit recognition to a specific region of an image rather than the entire image.
  • Customize OCR behavior by specifying language, page segmentation mode, resolution, and character whitelists/blacklists.

Properties

Name Meaning
Operation Choose between "Extract Text" (plain text output) or "Extract Boxes" (bounding boxes of text elements).
Granularity Level of detail for bounding boxes when extracting boxes: Paragraphs, Lines, Words, or Characters. Only shown for "Extract Boxes" operation.
Input Image Field Name The name of the incoming field that contains the image data to process. Default is "data".
Detect on Entire Image? Whether to perform OCR on the entire image (true) or only on a specified rectangular box (false).
Top Y The top coordinate of the box to perform OCR on (only if not detecting entire image).
Left X The left coordinate of the box to perform OCR on (only if not detecting entire image).
Width The width of the box to perform OCR on (only if not detecting entire image).
Height The height of the box to perform OCR on (only if not detecting entire image).
Options Collection of advanced options:
- Language Language code for OCR (e.g., "eng" for English). See Tesseract language codes.
- Page Segmentation Mode (PSM) Controls how Tesseract segments the image into text blocks. Options include Single Block, Single Column, Single Line, Single Word, Sparse Text. See this explanation.
- Resolution Optionally force a specific DPI resolution for the image instead of autodetection.
- Character Lists Configure character whitelist (only allow some characters) or blacklist (disallow some characters) to improve recognition accuracy.
Timeout Maximum time in milliseconds to wait for OCR processing before canceling.

Output

The node outputs an array of items corresponding to each input item. Each output item contains:

  • json: The recognized text or extracted bounding boxes depending on the operation.
    • For Extract Text operation, this includes the plain recognized text.
    • For Extract Boxes operation, this includes bounding box data structured according to the selected granularity (paragraphs, lines, words, or characters).
  • If a timeout occurs during processing, the output JSON will contain a timeout flag set to true.
  • In case of errors and if "Continue On Fail" is enabled, the output item will also include an error field describing the failure.

The node does not output binary data.

Dependencies

  • Requires the tesseract.js library bundled with the node for OCR processing.
  • No external API keys or services are needed; OCR is performed locally using the Tesseract engine.
  • The node allows specifying language data files via language codes, so appropriate language data must be available or downloaded by Tesseract internally.

Troubleshooting

  • Timeout Errors: If OCR takes too long, the node throws a timeout error. Increase the timeout property or optimize the input image size/resolution.
  • Incorrect Text Recognition: Check that the correct language code is set. Use character whitelists or blacklists to improve accuracy.
  • Bounding Box Misalignment: Ensure coordinates and dimensions for partial image detection are correct and within image bounds.
  • Empty or Missing Input Data: Verify that the input image field name matches the actual input data field containing the image.
  • Page Segmentation Mode Issues: Try different PSM settings if text is not detected properly (e.g., switch from SINGLE_BLOCK to SPARSE_TEXT for scattered text).

Links and References

Discussion