PDF-LIB

Perform operations on PDF files (get info, split)

Actions2

- Get PDF Info
- Split PDF

Overview

This node provides PDF file manipulation capabilities using the PDF-LIB library. It supports two main operations:

Get PDF Info: Extracts basic information from a PDF file, such as the total number of pages.
Split PDF: Splits a PDF into multiple smaller PDFs, each containing a specified number of pages.

Common scenarios for this node include:

Quickly retrieving metadata about PDF documents in workflows.
Breaking down large PDFs into smaller chunks for easier processing or distribution.
Automating document handling tasks where PDFs need to be segmented.

For example, you might use the "Get PDF Info" operation to determine how many pages a report has before deciding how to process it further. Alternatively, the "Split PDF" operation can be used to divide a large contract into individual sections for separate review.

Properties

Name	Meaning
Operation	The action to perform on the PDF file. Options: "Get PDF Info", "Split PDF".
Binary Property	Name of the binary property that contains the PDF file data. Default is `"data"`.
Chunk Size	(Only for "Split PDF") Number of pages per split chunk. Defaults to 1.

Output

For Get PDF Info:
- json output includes:
  - pageCount: Total number of pages in the PDF.
  - operation: The string "getInfo".
  - fileName: Original file name of the PDF or "unknown.pdf" if not available.
For Split PDF:
- json output includes:
  - count: Number of split PDF chunks created.
  - pageRanges: Array of strings indicating page ranges for each chunk (e.g., "1-3").
  - operation: The string "split".
  - originalFileName: Original file name of the PDF or "unknown.pdf" if not available.
- binary output includes multiple binary properties named pdf1, pdf2, etc., each containing one split PDF chunk with:
  - data: Base64 encoded PDF content.
  - fileName: Generated file name like "split_1.pdf".
  - mimeType: Always "application/pdf".

Dependencies

Uses the PDF-LIB library bundled internally for PDF parsing and manipulation.
Reads PDF files either from the workflow's binary data or attempts to load from the filesystem path if available.
Requires the input item to contain valid binary PDF data under the specified binary property.

Troubleshooting

No binary data property found: If the specified binary property does not exist on an input item, the node will throw an error. Ensure the correct binary property name is set and that the input contains valid PDF binary data.
Failed to load PDF: Errors may occur if the PDF is corrupted or unreadable. The node tries both filesystem loading and binary buffer loading; failure in both results in an error message detailing both attempts.
Chunk size issues: When splitting, setting a chunk size less than 1 or non-integer values may cause unexpected behavior. Use positive integers.
Continue on Fail: If enabled, errors on individual items will be captured in the output JSON under error and processing will continue for other items.