Actions5
- Crawler Actions
- Deep SerpApi Actions
- Universal Scraping API Actions
Overview
The node provides integration with Scrapeless, a service offering web scraping and crawling capabilities. Specifically for the Crawler resource with the Crawl operation, it allows users to crawl a specified URL and retrieve data from multiple subpages up to a defined limit.
This node is beneficial when you want to programmatically gather structured data from websites that require crawling through multiple linked pages rather than just scraping a single page. For example, crawling an e-commerce site category page to collect product details across several subpages or crawling blog archives to extract article metadata.
Properties
| Name | Meaning |
|---|---|
| URL to Crawl | The starting URL where the crawler begins its operation. You can specify any valid webpage URL. |
| Number Of Subpages | Maximum number of subpages to crawl and return results from. Limited to 100 subpages in this node. |
Output
The output is a JSON object containing the crawled data aggregated from the specified URL and its subpages (up to the limit). Each item in the output corresponds to data extracted from one page during the crawl.
If the node supports binary data output (not explicitly shown here), it would typically represent downloaded files or media related to the crawl, but this node primarily outputs JSON data representing the crawl results.
Dependencies
- Requires an API key credential for Scrapeless service authentication.
- The node depends on Scrapeless APIs to perform crawling operations.
- No additional environment variables are indicated as necessary beyond the API credential.
Troubleshooting
Common issues:
- Exceeding the maximum allowed number of subpages (more than 100) will not work within this node; reduce the limit accordingly.
- Invalid or unreachable URLs may cause errors or empty results.
- Network or API authentication failures if the Scrapeless API key is missing or invalid.
Error messages:
"Unsupported resource: <resource>"— indicates an invalid resource parameter; ensure "crawler" is selected.- Errors related to API calls will usually contain messages from the Scrapeless service; verify credentials and network connectivity.
To resolve errors:
- Double-check the URL format and accessibility.
- Confirm the API key is correctly configured in n8n credentials.
- Reduce the number of subpages if hitting limits.
Links and References
- Scrapeless Official Documentation (for detailed SDK usage and advanced crawling options)
- n8n Documentation on Creating Custom Nodes