Scraper (CustomJs) icon

Scraper (CustomJs)

A web scraper module to crawl websites and interact with page elements like clicks and typing.

Overview

This node is a web scraper designed to crawl websites and interact with page elements through user-defined actions such as clicking, typing, or waiting. It can either return a screenshot of the webpage or the raw HTML content after performing these interactions. This makes it useful for automating data extraction from dynamic websites where interaction is required before the desired content appears.

Common scenarios include:

  • Automating form filling and submission on websites.
  • Navigating through multi-step pages to capture final content.
  • Taking screenshots of specific states of a webpage after interactions.
  • Extracting HTML content that only appears after certain user actions.

For example, you could use this node to open a login page, type in credentials, click the login button, wait for the dashboard to load, and then capture a screenshot or extract the resulting HTML.

Properties

Name Meaning
Website URL The HTTPS URL of the website to scrape. Must start with https://.
User Actions A list of commands to perform on the page. Each command can be:
- Click: Click an element specified by a selector.
- Type: Type a given value into an element specified by a selector.
- Wait: Wait for a certain condition or time (details depend on implementation).
Return Type Defines the output format: either raw HTML (text) or a PNG screenshot (binary).
Debug Mode If enabled, the operation will cancel if an element specified in commands is not found; otherwise, it continues.

Output

The node outputs an array with one item per input item processed. Each output item contains:

  • For Raw HTML return type:

    {
      "output": "<html>...</html>"
    }
    

    The output field contains the full HTML content of the page after executing all commands.

  • For Screenshot (PNG) return type:
    The output includes the original JSON input plus a binary data field named data containing the PNG image of the webpage after interactions.

    Example structure:

    {
      "json": { ...original input... },
      "binary": {
        "data": {
          "data": "<base64-encoded PNG>",
          "mimeType": "image/png",
          "fileName": "output.png"
        }
      }
    }
    

Dependencies

  • Requires an API key credential to access an external scraping service at https://e.customjs.io.
  • The node sends a POST request to this service with the URL, commands, and options.
  • The external service performs the actual scraping and returns either HTML or a screenshot.
  • No local browser automation is done within the node itself; it relies on this external API.

Troubleshooting

  • Error: "Website URL must start with https://"
    Ensure the URL provided starts exactly with https://. URLs starting with http:// or missing protocol will cause failure.

  • Element Not Found Errors (when Debug Mode is enabled)
    If the node cancels due to an element not being found, verify the selectors used in commands are correct and present on the target page.

  • Empty or Unexpected Output
    Check that the commands sequence correctly leads to the desired page state. Also, confirm the external API key is valid and has sufficient permissions.

  • API Request Failures
    Network issues or invalid API keys may cause request failures. Verify network connectivity and credential correctness.

Links and References

Discussion