Puppeteer icon

Puppeteer

Automate browser interactions using Puppeteer

Overview

The Get Screenshot operation of the Puppeteer node allows you to capture a screenshot of a web page by providing its URL. This is useful for automating website monitoring, archiving visual states of pages, generating previews, or integrating web content snapshots into workflows. For example, you might use this node to periodically capture screenshots of your homepage for change tracking, generate thumbnails for user-submitted URLs, or document the appearance of a web application at specific times.

Properties

Below are the input properties relevant to the Get Screenshot operation:

Display Name Type Description
URL string The web address of the page to capture.
Property Name string Name of the binary property in which to store the image data.
Type options The image format: PNG, JPEG, or WebP.
Quality number Image quality (0-100). Only applies to JPEG and WebP formats.
Full Page boolean If true, captures the entire scrollable page; otherwise, only the visible viewport.
Query Parameters fixedCollection Additional query parameters to append to the URL.
Options collection Advanced settings, including:
- Batch Size number Maximum number of pages to open simultaneously.
- Browser WebSocket Endpoint string Connects to an existing browser instance via WebSocket.
- Emulate Device options Emulates a specific device (e.g., mobile, tablet).
- Executable path string Path to the browser executable.
- Extra Headers fixedCollection Custom HTTP headers to send with the request.
- File Name string Sets the file name in the binary output.
- Launch Arguments fixedCollection Additional command line arguments for the browser.
- Timeout number Maximum navigation time in milliseconds.
- Protocol Timeout number Maximum protocol response wait time in milliseconds.
- Wait Until options When to consider navigation complete (e.g., load, domcontentloaded, networkidle0, networkidle2).
- Page Caching boolean Enables or disables page-level caching.
- Headless mode boolean Runs browser in headless mode if true.
- Use Chrome Headless Shell boolean Uses chrome-headless-shell if enabled.
- Stealth mode boolean Makes detection of headless Puppeteer harder.
- Human typing mode boolean Simulates human-like typing on the page.
- Human Typing Options collection Fine-tunes human typing simulation.
- Proxy Server string Uses a custom proxy configuration.
- Add Container Arguments boolean Adds recommended arguments for container environments.

Output

The node outputs an item with the following structure when the Get Screenshot operation is used:

{
  "binary": {
    "<Property Name>": {
      // Binary image data (PNG, JPEG, or WebP) stored under the specified property name.
      // Includes metadata such as file name and MIME type.
    }
  },
  "json": {
    "headers": { /* HTTP response headers from the page request */ },
    "statusCode": 200, // HTTP status code of the page load
    "url": "https://example.com" // Final URL after any redirects
  }
}
  • The binary field contains the screenshot image data, accessible using the property name you specified.
  • The json field provides metadata about the HTTP response.

Dependencies

  • External Services: No external API keys required, but internet access is needed to reach the target URL.
  • Browser Dependency: Requires Puppeteer and may use puppeteer-extra plugins for stealth and human typing features.
  • n8n Configuration:
    • If using "Browser WebSocket Endpoint," ensure a compatible browser instance is running and accessible.
    • For "Use Chrome Headless Shell," chrome-headless-shell must be available in the system's PATH.
  • Environment Variables: Some advanced features may reference environment variables for VM sandboxing, but these are not typically required for standard usage.

Troubleshooting

Common Issues:

  • Invalid URL: If the provided URL is malformed, the node will return an error indicating "Invalid URL."
  • Navigation Timeout: If the page takes too long to load, you may see a timeout error. Adjust the "Timeout" option as needed.
  • Unsupported Operation: If an unsupported operation is selected, an error will indicate this.
  • Failed to launch/connect to browser: If Puppeteer cannot start or connect to a browser, check your browser installation, executable path, and permissions.
  • Resource Limits: Opening too many pages simultaneously (high batch size) can exhaust memory/CPU resources.

Error Messages:

  • "Request failed with status code <code>": The page could not be loaded successfully (e.g., 404, 500).
  • "Failed to launch/connect to browser: ...": Indicates issues starting or connecting to the browser instance.
  • "Error closing page/browser:": Non-critical errors during cleanup; usually safe to ignore unless persistent.

Resolutions:

  • Double-check URLs and query parameters.
  • Lower batch size if experiencing resource exhaustion.
  • Ensure all paths and endpoints are correct for custom browser configurations.
  • Increase timeouts for slow-loading pages.

Links and References

Discussion