Actions80
- Add Attachment To PDF
- Add Barcode To PDF
- Add Form Fields To PDF
- Add HTML Header Footer
- Add Image Stamp To PDF
- Add Image Watermark To Image
- Add Margin To PDF
- Add Page Number To PDF
- Add Text Stamp To PDF
- Add Text Watermark To Image
- AI-Invoice Parser
- AI-Process Contract
- AI-Process HealthCard
- Classify Document
- Compress Image
- Compress PDF
- Convert HTML To PDF
- Convert Image Format
- Convert JSON To Excel
- Convert Markdown To PDF
- Convert PDF To Editable PDF Using OCR
- Convert PDF To Excel
- Convert PDF To PowerPoint
- Convert PDF To Word
- Convert To PDF
- Convert URL to PDF
- Convert VISIO
- Convert Word to PDF Form
- Create Images From PDF
- Create PDF/A
- Create Swiss QR Bill
- Crop Image
- Delete Blank Pages From PDF
- Delete Unwanted Pages From PDF
- Split PDF By Barcode
- Disable Tracking Changes In Word
- Enable Tracking Changes In Word
- Extract Attachment From PDF
- Extract Form Data From PDF
- Extract Pages From PDF
- Extract Resources
- Extract Table From PDF
- Extract Text By Expression
- Extract Text From Word
- Fill PDF Form
- Find And Replace Text
- Flip Image
- Flatten PDF
- Generate Barcode
- Generate Document Single
- Generate Documents Multiple
- Get Document From Pdf4me
- Get Image Metadata
- Get PDF Metadata
- Split PDF By Swiss QR
- Get Tracking Changes In Word
- Image Extract Text
- Linearize PDF
- Merge Multiple PDFs
- Overlay PDFs
- Parse Document
- Protect PDF
- Read Barcode From Image
- Read Barcode From PDF
- Read SwissQR Code
- Remove EXIF Tags From Image
- Repair PDF Document
- Replace Text With Image
- Replace Text With Image In Word
- Resize Image
- Rotate Document
- Rotate Image
- Rotate Image By EXIF Data
- Rotate PDF Page
- Sign PDF
- Split PDF By Text
- Split PDF Regular
- Unlock PDF
- Update Hyperlinks Annotation
- Upload File To PDF4me
Overview
The node provides a "Parse Document" operation that extracts structured data from documents using a specified parsing configuration. It supports multiple input methods for the document, including binary data from previous nodes, base64-encoded strings, or URLs pointing to the document file. The parsed output can be returned either as JSON data or as a text file.
This node is beneficial in scenarios where automated extraction of information from PDFs or other document formats is needed, such as invoice processing, contract analysis, or form data extraction. For example, a user could upload an invoice PDF and use this node to parse key fields like invoice number, date, and total amount into JSON for further workflow automation.
Properties
| Name | Meaning |
|---|---|
| Input Data Type | Choose how to provide the document to parse. Options: Binary Data (document file from previous node), Base64 String (base64 encoded document content), URL (link to document file). |
| Input Binary Field | Name of the binary property containing the document file (used only if Input Data Type is Binary Data). |
| Base64 Document Content | Base64 encoded content of the document (used only if Input Data Type is Base64 String). |
| Document URL | URL to the document file to parse (used only if Input Data Type is URL). |
| Document Name | Name of the source document file for reference (e.g., "original-document.pdf"). |
| Parse ID | GUID of the parse configuration to use; also serves as the Template ID for parsing rules. |
| Output Format | Format for the parsed document output. Options: JSON (parsed data as JSON), Text File (parsed data as a text file). |
| Output File Name | Name for the output file when Output Format is set to Text File (e.g., "my-parsed-document.txt"). |
| Advanced Options | Collection of additional options. Currently supports "Custom Profiles" where users can specify JSON to adjust custom properties for API calls, e.g., setting extra parsing options according to external profile documentation. |
| Binary Data Output Name | Custom name for the binary data field in the node's output (default is "data"). |
Output
- json: Contains the parsed document data. If the output format is JSON, this will be structured data extracted from the document according to the parse configuration.
- binary: If the output format is a text file, the parsed content is provided as binary data with the specified output file name. This allows downstream nodes to handle the parsed text as a downloadable or storable file.
Dependencies
- Requires access to an external document parsing service via API, which uses the provided Parse ID (parse configuration GUID) to interpret the document.
- Needs proper authentication credentials configured in n8n to connect to the parsing API.
- Network access to fetch documents if using URL input type.
Troubleshooting
Common issues:
- Incorrect or missing Parse ID may cause parsing failures or unexpected results.
- Providing an invalid or inaccessible URL will result in errors fetching the document.
- Mismatch between Input Data Type and provided data (e.g., selecting Binary Data but no binary input present) will cause errors.
- Output format mismatch or incorrect output file naming might lead to confusion in downstream processing.
Error messages:
- Errors related to document retrieval (e.g., network errors, 404 not found) indicate problems accessing the document URL.
- Parsing errors often relate to invalid parse configurations or unsupported document formats.
- Authentication errors suggest missing or invalid API credentials.
Resolutions:
- Verify the Parse ID is correct and corresponds to a valid parsing template.
- Ensure the document URL is accessible and correct.
- Confirm the input data matches the selected Input Data Type.
- Check API credentials and permissions in n8n settings.
Links and References
- PDF4me Developer Profiles Documentation — for configuring custom profiles in advanced options.
- General API documentation for the external parsing service (not included here, but typically available from the service provider).