Tesseract

Recognize text in images

Actions2

- Extract Text
- Extract Boxes

Overview

This node performs Optical Character Recognition (OCR) on images using the Tesseract OCR engine. It can extract plain text or detailed bounding boxes of text elements from images, supporting various levels of granularity such as paragraphs, lines, words, or characters.

Common scenarios where this node is beneficial include:

Extracting textual content from scanned documents, invoices, receipts, or photos.
Automating data entry by converting image-based text into machine-readable formats.
Analyzing structured documents by extracting text boxes with positional information.
Processing images with specific regions of interest rather than the entire image.

Practical examples:

Extracting all text from a photographed business card to automatically populate contact details.
Detecting and extracting only a specific area of an invoice image containing the total amount.
Obtaining word-level bounding boxes from a scanned form to analyze layout or highlight recognized words.

Properties

Name	Meaning
Operation	Choose between "Extract Text" (plain text extraction) or "Extract Boxes" (extract bounding boxes of text elements).
Granularity	When extracting boxes, select the detail level: Paragraphs, Lines, Words, or Characters (symbols).
Input Image Field Name	The name of the incoming field that contains the image data to be processed. Default is `"data"`.
Detect on Entire Image?	Whether to perform OCR on the entire image (`true`) or only on a specified rectangular box (`false`).
Top Y	The top coordinate (Y-axis) of the box to process when not detecting on the entire image.
Left X	The left coordinate (X-axis) of the box to process when not detecting on the entire image.
Width	The width of the box to process when not detecting on the entire image.
Height	The height of the box to process when not detecting on the entire image.
Options	Collection of advanced options:
- Language	Language code for OCR (e.g., `"eng"` for English). See Tesseract language codes.
- Page Segmentation Mode (PSM)	Defines how Tesseract segments the page. Options include Single Block, Single Column, Single Line, Single Word, Sparse Text. More info at PyImageSearch PSM explanation.
- Resolution	Optionally force a specific DPI resolution for the image instead of autodetection.
- Character Lists	Configure character whitelist or blacklist to restrict recognized characters. Includes enabling/disabling whitelist or blacklist and specifying allowed or disallowed characters.
Timeout	Maximum time in milliseconds to allow for processing each image before canceling.

Output

The node outputs an array of items corresponding to the input items, each containing:

json:
- For Extract Text operation: a JSON object with the recognized plain text string.
- For Extract Boxes operation: a JSON object containing bounding box data at the selected granularity level (paragraphs, lines, words, or characters), including position and text content.
binary (if present): binary data from the input item is preserved.

If a timeout occurs during OCR processing, the output JSON will include a timeout flag set to true.

Dependencies

Uses the tesseract.js library for OCR functionality.
Requires the appropriate Tesseract language data files for the selected language.
No explicit external API keys or credentials are needed.
Node configuration should ensure sufficient memory and CPU resources for image processing.
Optional: If forcing resolution, the DPI value must be set appropriately.

Troubleshooting

Timeout Errors: If OCR processing takes too long, it may trigger a timeout error. Increase the Timeout property or optimize the input image size and quality.
Incorrect Text Recognition:
- Verify the correct language code is selected.
- Adjust the Page Segmentation Mode to better match the image layout.
- Use character whitelists or blacklists to improve accuracy if you expect limited character sets.
Empty or Missing Output:
- Ensure the input image field name matches the actual field containing image data.
- Confirm the image data is valid and accessible.
Partial OCR on Box: When Detect on Entire Image? is false, verify the coordinates and dimensions of the box are correctly set within the image bounds.
Performance Issues: Large images or high-resolution settings increase processing time. Consider resizing images or adjusting resolution settings.