Scrapeless Official icon

Scrapeless Official

Official Scrapeless nodes for n8n

Actions5

Overview

The node provides integration with Scrapeless, a service offering web scraping and crawling capabilities. Specifically for the Crawler resource with the Crawl operation, it allows users to crawl a specified URL and retrieve data from multiple subpages up to a defined limit.

This node is beneficial when you want to programmatically gather structured data from websites that require crawling through multiple linked pages rather than just scraping a single page. For example, crawling an e-commerce site category page to collect product details across several subpages or crawling blog archives to extract article metadata.

Properties

Name Meaning
URL to Crawl The starting URL where the crawler begins its operation. You can specify any valid webpage URL.
Number Of Subpages Maximum number of subpages to crawl and return results from. Limited to 100 subpages in this node.

Output

The output is a JSON object containing the crawled data aggregated from the specified URL and its subpages (up to the limit). Each item in the output corresponds to data extracted from one page during the crawl.

If the node supports binary data output (not explicitly shown here), it would typically represent downloaded files or media related to the crawl, but this node primarily outputs JSON data representing the crawl results.

Dependencies

  • Requires an API key credential for Scrapeless service authentication.
  • The node depends on Scrapeless APIs to perform crawling operations.
  • No additional environment variables are indicated as necessary beyond the API credential.

Troubleshooting

  • Common issues:

    • Exceeding the maximum allowed number of subpages (more than 100) will not work within this node; reduce the limit accordingly.
    • Invalid or unreachable URLs may cause errors or empty results.
    • Network or API authentication failures if the Scrapeless API key is missing or invalid.
  • Error messages:

    • "Unsupported resource: <resource>" — indicates an invalid resource parameter; ensure "crawler" is selected.
    • Errors related to API calls will usually contain messages from the Scrapeless service; verify credentials and network connectivity.
  • To resolve errors:

    • Double-check the URL format and accessibility.
    • Confirm the API key is correctly configured in n8n credentials.
    • Reduce the number of subpages if hitting limits.

Links and References

Discussion