Actions2
Overview
This node uses the Tesseract OCR engine to recognize text within images. It supports two main operations: extracting plain text from an image and extracting bounding boxes around text elements at various levels of granularity (paragraphs, lines, words, or characters). This node is useful for automating data extraction from scanned documents, photos containing text, invoices, receipts, or any image where textual content needs to be programmatically accessed.
For example, you can use it to:
- Extract all text from a scanned contract.
- Identify and locate individual words or lines in an image for further processing.
- Limit recognition to a specific region of an image rather than the entire image.
- Customize OCR behavior by specifying language, page segmentation mode, resolution, and character whitelists/blacklists.
Properties
| Name | Meaning |
|---|---|
| Operation | Choose between "Extract Text" (plain text output) or "Extract Boxes" (bounding boxes of text elements). |
| Granularity | Level of detail for bounding boxes when extracting boxes: Paragraphs, Lines, Words, or Characters. Only shown for "Extract Boxes" operation. |
| Input Image Field Name | The name of the incoming field that contains the image data to process. Default is "data". |
| Detect on Entire Image? | Whether to perform OCR on the entire image (true) or only on a specified rectangular box (false). |
| Top Y | The top coordinate of the box to perform OCR on (only if not detecting entire image). |
| Left X | The left coordinate of the box to perform OCR on (only if not detecting entire image). |
| Width | The width of the box to perform OCR on (only if not detecting entire image). |
| Height | The height of the box to perform OCR on (only if not detecting entire image). |
| Options | Collection of advanced options: |
| - Language | Language code for OCR (e.g., "eng" for English). See Tesseract language codes. |
| - Page Segmentation Mode (PSM) | Controls how Tesseract segments the image into text blocks. Options include Single Block, Single Column, Single Line, Single Word, Sparse Text. See this explanation. |
| - Resolution | Optionally force a specific DPI resolution for the image instead of autodetection. |
| - Character Lists | Configure character whitelist (only allow some characters) or blacklist (disallow some characters) to improve recognition accuracy. |
| Timeout | Maximum time in milliseconds to wait for OCR processing before canceling. |
Output
The node outputs an array of items corresponding to each input item. Each output item contains:
json: The recognized text or extracted bounding boxes depending on the operation.- For Extract Text operation, this includes the plain recognized text.
- For Extract Boxes operation, this includes bounding box data structured according to the selected granularity (paragraphs, lines, words, or characters).
- If a timeout occurs during processing, the output JSON will contain a
timeoutflag set totrue. - In case of errors and if "Continue On Fail" is enabled, the output item will also include an
errorfield describing the failure.
The node does not output binary data.
Dependencies
- Requires the
tesseract.jslibrary bundled with the node for OCR processing. - No external API keys or services are needed; OCR is performed locally using the Tesseract engine.
- The node allows specifying language data files via language codes, so appropriate language data must be available or downloaded by Tesseract internally.
Troubleshooting
- Timeout Errors: If OCR takes too long, the node throws a timeout error. Increase the timeout property or optimize the input image size/resolution.
- Incorrect Text Recognition: Check that the correct language code is set. Use character whitelists or blacklists to improve accuracy.
- Bounding Box Misalignment: Ensure coordinates and dimensions for partial image detection are correct and within image bounds.
- Empty or Missing Input Data: Verify that the input image field name matches the actual input data field containing the image.
- Page Segmentation Mode Issues: Try different PSM settings if text is not detected properly (e.g., switch from SINGLE_BLOCK to SPARSE_TEXT for scattered text).