Actions80
- Add Attachment To PDF
- Add Barcode To PDF
- Add Form Fields To PDF
- Add HTML Header Footer
- Add Image Stamp To PDF
- Add Image Watermark To Image
- Add Margin To PDF
- Add Page Number To PDF
- Add Text Stamp To PDF
- Add Text Watermark To Image
- AI-Invoice Parser
- AI-Process Contract
- AI-Process HealthCard
- Classify Document
- Compress Image
- Compress PDF
- Convert HTML To PDF
- Convert Image Format
- Convert JSON To Excel
- Convert Markdown To PDF
- Convert PDF To Editable PDF Using OCR
- Convert PDF To Excel
- Convert PDF To PowerPoint
- Convert PDF To Word
- Convert To PDF
- Convert URL to PDF
- Convert VISIO
- Convert Word to PDF Form
- Create Images From PDF
- Create PDF/A
- Create Swiss QR Bill
- Crop Image
- Delete Blank Pages From PDF
- Delete Unwanted Pages From PDF
- Split PDF By Barcode
- Disable Tracking Changes In Word
- Enable Tracking Changes In Word
- Extract Attachment From PDF
- Extract Form Data From PDF
- Extract Pages From PDF
- Extract Resources
- Extract Table From PDF
- Extract Text By Expression
- Extract Text From Word
- Fill PDF Form
- Find And Replace Text
- Flip Image
- Flatten PDF
- Generate Barcode
- Generate Document Single
- Generate Documents Multiple
- Get Document From Pdf4me
- Get Image Metadata
- Get PDF Metadata
- Split PDF By Swiss QR
- Get Tracking Changes In Word
- Image Extract Text
- Linearize PDF
- Merge Multiple PDFs
- Overlay PDFs
- Parse Document
- Protect PDF
- Read Barcode From Image
- Read Barcode From PDF
- Read SwissQR Code
- Remove EXIF Tags From Image
- Repair PDF Document
- Replace Text With Image
- Replace Text With Image In Word
- Resize Image
- Rotate Document
- Rotate Image
- Rotate Image By EXIF Data
- Rotate PDF Page
- Sign PDF
- Split PDF By Text
- Split PDF Regular
- Unlock PDF
- Update Hyperlinks Annotation
- Upload File To PDF4me
Overview
This node extracts metadata from PDF files. It supports multiple input methods for providing the PDF file: as binary data from a previous node, as a base64-encoded string, or via a URL pointing to the PDF file. The node processes the PDF and outputs its metadata in JSON format.
Common scenarios where this node is useful include:
- Automatically extracting document properties (author, title, creation date, etc.) from PDFs in an automation workflow.
- Validating PDF metadata before further processing or archiving.
- Collecting metadata for indexing or cataloging large collections of PDF documents.
Practical example:
- A user receives invoices as PDFs and wants to extract metadata such as invoice date and author automatically to populate a database without manual entry.
Properties
| Name | Meaning |
|---|---|
| Input Data Type | Choose how to provide the PDF file: - Binary Data (from previous node) - Base64 String - URL |
| Input Binary Field | Name of the binary property containing the PDF file (used if Input Data Type is Binary Data) |
| Base64 PDF Content | Base64 encoded string of the PDF content (used if Input Data Type is Base64 String) |
| PDF URL | URL to the PDF file to extract metadata from (used if Input Data Type is URL) |
| Output File Name | Filename for the output JSON metadata file (default: pdf_metadata.json) |
| Binary Data Output Name | Name of the binary property that will contain the JSON metadata file (default: data) |
| Async | Enable asynchronous processing (true/false) |
Output
The node outputs a JSON object containing the extracted metadata from the PDF file. This metadata typically includes standard PDF document properties such as:
- Title
- Author
- Subject
- Keywords
- Creator
- Producer
- Creation Date
- Modification Date
- Number of pages
- PDF version
The JSON metadata is also available as a binary file with the specified output filename and binary property name, allowing downstream nodes to consume it as a file if needed.
Dependencies
- Requires access to the PDF file either as binary data, base64 string, or accessible URL.
- Depends on an external PDF processing service or library integrated within the node's implementation to extract metadata.
- May require an API key or authentication token configured in n8n credentials to access the PDF processing service (not explicitly shown in the code but typical for such nodes).
Troubleshooting
Common issues:
- Providing an incorrect binary property name when using binary data input will cause the node to fail to find the PDF file.
- Invalid base64 strings or inaccessible URLs will result in errors during PDF retrieval.
- Network issues when fetching PDF from URL can cause timeouts or failures.
- If asynchronous processing is enabled but the service does not support it properly, results may be delayed or incomplete.
Error messages:
- "Binary property not found": Check that the binary field name matches the actual property containing the PDF.
- "Invalid base64 content": Verify the base64 string is correctly encoded.
- "Failed to fetch PDF from URL": Ensure the URL is correct and publicly accessible or accessible with provided credentials.
- General API errors: Confirm API keys or tokens are valid and have sufficient permissions.
Links and References
- PDF Metadata Wikipedia
- Base64 Encoding
- n8n Documentation - Working with Binary Data
- Common PDF Metadata Fields (Adobe PDF specification)