Actions5
- Crawler Actions
- Deep SerpApi Actions
- Universal Scraping API Actions
Overview
The node "Scrapeless Official" provides web scraping and crawling capabilities through multiple resources, including a "Crawler" resource. Specifically, the Crawler - Scrape operation allows users to extract data from a specified URL by crawling its content. This is useful for scenarios where you want to programmatically gather information from websites, such as extracting product details, news articles, or any publicly available web data.
Practical examples include:
- Extracting all article headlines from a news website.
- Gathering product prices and descriptions from an e-commerce page.
- Collecting metadata or structured data embedded in a webpage.
Properties
| Name | Meaning |
|---|---|
| URL to Crawl | The web address (URL) of the page to be crawled and scraped. Supports single URLs. |
Output
The node outputs JSON data representing the scraped content from the target URL. The exact structure depends on the response returned by the underlying scraping service but generally includes the extracted data fields from the webpage.
If the node supports binary data output (not explicitly shown here), it would typically represent downloaded files or media fetched during crawling, but this is not detailed in the provided code.
Dependencies
- Requires an API key credential for the Scrapeless service (referred generically as an API authentication token).
- Depends on external Scrapeless APIs that provide scraping and crawling functionalities.
- No additional environment variables are indicated beyond the required API credential.
Troubleshooting
Common issues:
- Invalid or missing URL input will cause the operation to fail.
- Network errors or blocked requests if the target website restricts automated scraping.
- API authentication failures if the Scrapeless API key is invalid or expired.
Error messages:
"Unsupported resource: <resource>"— occurs if an unsupported resource name is provided; ensure the resource is set to "crawler" for this operation.- Errors related to HTTP request failures or parsing issues will be returned from the Scrapeless API and surfaced in the node's output if "Continue On Fail" is enabled.
Resolutions:
- Verify the URL format and accessibility.
- Check API key validity and permissions.
- Use "Continue On Fail" option to handle partial failures gracefully.
Links and References
- Scrapeless Official Documentation (hypothetical link)
- n8n documentation on Creating Custom Nodes
- General web scraping best practices and legal considerations: Web Scraping Guide