PDF4me icon

PDF4me

Comprehensive PDF and document processing: generate barcodes, convert files, extract data, manipulate images, and automate workflows with the PDF4ME API

Actions80

Overview

This node operation converts PDF files into Excel spreadsheets. It supports multiple input methods for the PDF file, including binary data from a previous node, base64-encoded content, or a URL pointing to the PDF. The conversion process can be adjusted for quality and OCR (Optical Character Recognition) settings, making it suitable for both simple digital PDFs and scanned documents with images. Users can choose whether to merge all resulting Excel sheets into one or keep them separate, and preserve original formatting when possible.

Practical scenarios include:

  • Extracting tabular data from invoices, reports, or forms stored as PDFs.
  • Automating data entry by converting scanned paper documents into editable Excel files.
  • Integrating PDF-to-Excel conversion in workflows that require further data processing or analysis.

Properties

Name Meaning
Input Data Type Choose how to provide the PDF file: "Binary Data" (from previous node), "Base64 String" (direct base64 content), or "URL" (link to PDF file).
Input Binary Field Name of the binary property containing the PDF file when using "Binary Data" input type (usually "data").
Base64 PDF Content Base64 encoded string of the PDF document content, used if "Base64 String" is selected as input type.
PDF URL URL to the PDF file to convert, used if "URL" is selected as input type.
Quality Type Conversion quality: "Draft" (faster, good for simple tables) or "Quality" (slower, better for complex layouts).
Language OCR language setting for text recognition in scanned PDFs or images within PDFs (e.g., "English").
Merge All Sheets Boolean to combine all Excel sheets into one sheet (true) or keep them as separate sheets (false).
Output Format Boolean to preserve original formatting when possible (true) or not (false).
OCR When Needed Boolean to enable OCR for scanned PDFs automatically (true) or disable it (false).
Output File Name Custom name for the output Excel file (default: "PDF_to_EXCEL_output.xlsx").
Document Name Name of the source PDF file for reference purposes (default: "output.pdf").
Binary Data Output Name Custom name for the binary data field in the node's output (default: "data").

Output

The node outputs an Excel file converted from the provided PDF. The output is available as binary data under a customizable binary property name (default "data"). The JSON output typically contains metadata about the conversion and references to the binary Excel file. The Excel file may contain one or multiple sheets depending on the "Merge All Sheets" option.

If the PDF was scanned or contained images, OCR is applied (if enabled) to extract text accurately into the Excel format.

Dependencies

  • Requires access to the PDF file either via binary data, base64 content, or a URL.
  • Uses OCR capabilities which may depend on underlying OCR libraries or services configured in the environment.
  • No explicit external API keys or credentials are shown in the code snippet, but the node likely requires proper configuration of the PDF processing service it uses internally.

Troubleshooting

  • Common issues:

    • Providing an invalid or inaccessible URL will cause the conversion to fail.
    • Incorrect binary property name when using binary input will result in missing input data errors.
    • OCR may fail or produce inaccurate results if the language setting does not match the document's language.
    • Large or complex PDFs may take longer to process or exceed resource limits.
  • Error messages:

    • Errors related to missing input data usually indicate misconfiguration of the input properties.
    • OCR-related errors might suggest unsupported languages or corrupted PDF content.
    • Network errors when using URL input indicate connectivity or permission issues.
  • Resolutions:

    • Verify the input data type and corresponding fields carefully.
    • Ensure URLs are accessible and point directly to valid PDF files.
    • Adjust OCR language settings to match the document.
    • Use "Draft" quality for faster processing on simpler documents.

Links and References

Discussion