Actions80
- Add Attachment To PDF
- Add Barcode To PDF
- Add Form Fields To PDF
- Add HTML Header Footer
- Add Image Stamp To PDF
- Add Image Watermark To Image
- Add Margin To PDF
- Add Page Number To PDF
- Add Text Stamp To PDF
- Add Text Watermark To Image
- AI-Invoice Parser
- AI-Process Contract
- AI-Process HealthCard
- Classify Document
- Compress Image
- Compress PDF
- Convert HTML To PDF
- Convert Image Format
- Convert JSON To Excel
- Convert Markdown To PDF
- Convert PDF To Editable PDF Using OCR
- Convert PDF To Excel
- Convert PDF To PowerPoint
- Convert PDF To Word
- Convert To PDF
- Convert URL to PDF
- Convert VISIO
- Convert Word to PDF Form
- Create Images From PDF
- Create PDF/A
- Create Swiss QR Bill
- Crop Image
- Delete Blank Pages From PDF
- Delete Unwanted Pages From PDF
- Split PDF By Barcode
- Disable Tracking Changes In Word
- Enable Tracking Changes In Word
- Extract Attachment From PDF
- Extract Form Data From PDF
- Extract Pages From PDF
- Extract Resources
- Extract Table From PDF
- Extract Text By Expression
- Extract Text From Word
- Fill PDF Form
- Find And Replace Text
- Flip Image
- Flatten PDF
- Generate Barcode
- Generate Document Single
- Generate Documents Multiple
- Get Document From Pdf4me
- Get Image Metadata
- Get PDF Metadata
- Split PDF By Swiss QR
- Get Tracking Changes In Word
- Image Extract Text
- Linearize PDF
- Merge Multiple PDFs
- Overlay PDFs
- Parse Document
- Protect PDF
- Read Barcode From Image
- Read Barcode From PDF
- Read SwissQR Code
- Remove EXIF Tags From Image
- Repair PDF Document
- Replace Text With Image
- Replace Text With Image In Word
- Resize Image
- Rotate Document
- Rotate Image
- Rotate Image By EXIF Data
- Rotate PDF Page
- Sign PDF
- Split PDF By Text
- Split PDF Regular
- Unlock PDF
- Update Hyperlinks Annotation
- Upload File To PDF4me
Overview
This node operation, Extract Resources, is designed to extract various resources from a PDF document. It supports extracting text content and images embedded within the PDF. Users can provide the PDF input in multiple formats: as binary data from a previous node, as a base64-encoded string, or via a URL pointing to the PDF file.
Common scenarios where this node is beneficial include:
- Extracting textual information for indexing, searching, or further processing.
- Retrieving images embedded in PDFs for use in other workflows or analysis.
- Processing specific pages or ranges of pages within a PDF.
- Handling PDFs from different sources/formats flexibly.
Practical examples:
- Extract all text from an invoice PDF received as binary data to automate data entry.
- Extract images from a PDF brochure provided via URL to reuse them in marketing materials.
- Extract text and images only from pages 2 to 5 of a large PDF report supplied as a base64 string.
Properties
| Name | Meaning |
|---|---|
| Input Data Type | Choose how to provide the PDF file to extract resources from. Options: Binary Data (from previous node), Base64 String (base64 encoded PDF content), URL (link to PDF file). |
| Input Binary Field | Name of the binary property containing the PDF file (usually "data" for file uploads). Used when Input Data Type is Binary Data. |
| Base64 PDF Content | Base64 encoded PDF document content. Used when Input Data Type is Base64 String. |
| PDF URL | URL to the PDF file to extract resources from. Used when Input Data Type is URL. |
| Document Name | Name of the document used internally during processing. Defaults to "document.pdf". |
| Extract Text | Boolean flag indicating whether to extract text content from the PDF. Default is true. |
| Extract Images | Boolean flag indicating whether to extract images from the PDF. Default is false. |
| Return Images as Binary | Boolean flag indicating whether extracted images should be returned as binary data in addition to JSON metadata. Default is false. |
| Binary Data Name | Name for the binary data property in the output when returning images as binary. Default is "image". Only shown if Return Images as Binary is true. |
| Advanced Options | Collection of additional options: |
| - Pages | Specify pages to extract resources from. Format examples: "all" (default), "1,2" (specific pages), "2-5" (page range), or combinations like "1-3,5,7". |
| - Custom Profiles | JSON string to adjust custom properties or profiles for API calls, allowing advanced configuration per https://dev.pdf4me.com/apiv2/documentation/. |
Output
The node outputs an array of items, each containing a json field with the extracted resources:
- If Extract Text is enabled, the JSON includes the extracted text content from the specified pages.
- If Extract Images is enabled, the JSON includes metadata about the extracted images.
- If Return Images as Binary is enabled, the node also outputs the actual image files as binary data under the specified binary property name (default "image").
The exact structure of the JSON depends on the extraction results but generally contains fields representing text blocks and image details.
Dependencies
- Requires access to the PDF processing service/API that performs resource extraction.
- The node expects proper authentication credentials configured in n8n to connect to this external PDF processing API.
- Network access is required if providing PDF via URL or if the API is cloud-based.
Troubleshooting
Common Issues:
- Providing incorrect or missing PDF input data (e.g., wrong binary property name or invalid base64 string) will cause extraction to fail.
- Specifying invalid page ranges or malformed custom profile JSON may result in errors.
- Network issues or invalid URLs when using URL input type can prevent fetching the PDF.
Error Messages:
- Errors related to missing input data usually indicate misconfiguration of the input properties.
- Parsing errors for custom profiles suggest invalid JSON syntax.
- API authentication failures require checking the configured API key or token.
Resolutions:
- Verify the input data matches the selected input type and property names.
- Validate JSON syntax for custom profiles before saving.
- Ensure API credentials are correctly set up and have necessary permissions.
- Confirm URLs are accessible and point to valid PDF files.
Links and References
- PDF4me API Documentation
- General info on PDF resource extraction concepts: https://en.wikipedia.org/wiki/PDF
- n8n documentation on working with binary data: https://docs.n8n.io/nodes/working-with-binary-data/