PDF4me

Comprehensive PDF and document processing: generate barcodes, convert files, extract data, manipulate images, and automate workflows with the PDF4ME API

Actions80

Overview

This node operation converts PDF files into Excel spreadsheets. It supports multiple input methods for the PDF file, including binary data from a previous node, base64-encoded content, or a URL pointing to the PDF. The conversion process can be adjusted for quality and OCR (Optical Character Recognition) settings, making it suitable for both simple digital PDFs and scanned documents with images. Users can choose whether to merge all resulting Excel sheets into one or keep them separate, and preserve original formatting when possible.

Practical scenarios include:

Extracting tabular data from invoices, reports, or forms stored as PDFs.
Automating data entry by converting scanned paper documents into editable Excel files.
Integrating PDF-to-Excel conversion in workflows that require further data processing or analysis.

Properties

Name	Meaning
Input Data Type	Choose how to provide the PDF file: "Binary Data" (from previous node), "Base64 String" (direct base64 content), or "URL" (link to PDF file).
Input Binary Field	Name of the binary property containing the PDF file when using "Binary Data" input type (usually "data").
Base64 PDF Content	Base64 encoded string of the PDF document content, used if "Base64 String" is selected as input type.
PDF URL	URL to the PDF file to convert, used if "URL" is selected as input type.
Quality Type	Conversion quality: "Draft" (faster, good for simple tables) or "Quality" (slower, better for complex layouts).
Language	OCR language setting for text recognition in scanned PDFs or images within PDFs (e.g., "English").
Merge All Sheets	Boolean to combine all Excel sheets into one sheet (true) or keep them as separate sheets (false).
Output Format	Boolean to preserve original formatting when possible (true) or not (false).
OCR When Needed	Boolean to enable OCR for scanned PDFs automatically (true) or disable it (false).
Output File Name	Custom name for the output Excel file (default: "PDF_to_EXCEL_output.xlsx").
Document Name	Name of the source PDF file for reference purposes (default: "output.pdf").
Binary Data Output Name	Custom name for the binary data field in the node's output (default: "data").

Output

The node outputs an Excel file converted from the provided PDF. The output is available as binary data under a customizable binary property name (default "data"). The JSON output typically contains metadata about the conversion and references to the binary Excel file. The Excel file may contain one or multiple sheets depending on the "Merge All Sheets" option.

If the PDF was scanned or contained images, OCR is applied (if enabled) to extract text accurately into the Excel format.

Dependencies

Requires access to the PDF file either via binary data, base64 content, or a URL.
Uses OCR capabilities which may depend on underlying OCR libraries or services configured in the environment.
No explicit external API keys or credentials are shown in the code snippet, but the node likely requires proper configuration of the PDF processing service it uses internally.

Troubleshooting

Common issues:
- Providing an invalid or inaccessible URL will cause the conversion to fail.
- Incorrect binary property name when using binary input will result in missing input data errors.
- OCR may fail or produce inaccurate results if the language setting does not match the document's language.
- Large or complex PDFs may take longer to process or exceed resource limits.
Error messages:
- Errors related to missing input data usually indicate misconfiguration of the input properties.
- OCR-related errors might suggest unsupported languages or corrupted PDF content.
- Network errors when using URL input indicate connectivity or permission issues.
Resolutions:
- Verify the input data type and corresponding fields carefully.
- Ensure URLs are accessible and point directly to valid PDF files.
- Adjust OCR language settings to match the document.
- Use "Draft" quality for faster processing on simpler documents.