Actions7
Overview
The Smart Scraper - Scrape operation in this node enables users to extract structured data from any webpage by providing a URL and instructions on what data to extract. It leverages an external scraping API service that processes the webpage content, optionally rendering JavaScript-heavy sites, handling infinite scrolling or pagination, and applying custom JSON schemas for output structuring.
This node is beneficial when you want to automate data extraction from websites without manually coding scrapers. For example, you can scrape product details from e-commerce pages, gather news article metadata, or collect listings from real estate sites. The ability to define a custom output schema allows precise control over the extracted data format, making it easy to integrate with downstream workflows.
Properties
| Name | Meaning |
|---|---|
| Website URL | The URL of the website page you want to scrape data from. |
| User Prompt | Instructions describing what specific data to extract from the given website. |
| Use Custom Output Schema | Whether to apply a user-defined JSON schema to structure the scraped output data. If enabled, the output will conform to the provided schema. |
| Output Schema | A JSON schema defining the expected structure, types, and descriptions of the output data fields. This property is required if "Use Custom Output Schema" is enabled. |
| Render Heavy JS | Whether to render JavaScript-heavy websites before scraping. Enabling this may consume additional credits as it requires more processing. |
| Enable Infinite Scrolling | Enables automatic scrolling down the page to load more content dynamically (useful for pages with infinite scroll). |
| Number of Scrolls | The number of times to scroll down the page when infinite scrolling is enabled. Determines how much additional content is loaded. |
| Enable Pagination | Enables scraping multiple pages by following pagination links. |
| Total Pages | The total number of pages to scrape when pagination is enabled. |
Output
- The node outputs a JSON object containing the scraped data returned by the external scraping API.
- If a custom output schema is used, the JSON output will be structured according to that schema, ensuring consistent field names and types.
- The output JSON typically includes the extracted content as specified by the user prompt and schema.
- No binary data output is produced by this operation.
Example output JSON (assuming default schema):
{
"title": "Sample Product",
"price": 29.99,
"description": "A detailed description of the sample product."
}
Dependencies
- Requires an active connection to the external scraping API service at
https://api.scrapegraphai.com/v1. - An API key credential must be configured in n8n to authenticate requests to the scraping API.
- Internet access is necessary for the node to send HTTP requests to the scraping service.
- Optional: Enabling JavaScript rendering or infinite scrolling may require additional API usage credits.
Troubleshooting
- Invalid JSON in Output Schema: If the custom output schema JSON is malformed, the node will throw an error indicating invalid JSON. To fix, ensure the JSON schema is correctly formatted and valid.
- API Authentication Errors: If the API key is missing or invalid, requests will fail. Verify that the API credential is properly set up in n8n.
- Empty or Unexpected Output: If the user prompt does not clearly specify what to extract, or the target website structure changes, the output may be empty or incorrect. Refine the user prompt or check the website's current layout.
- High Credit Usage: Enabling options like rendering heavy JavaScript or infinite scrolling increases API usage. Monitor your credit balance to avoid unexpected charges.
- Network Issues: Connectivity problems can cause request failures. Ensure stable internet access and that the API endpoint is reachable.
Links and References
- JSON Schema Documentation
- ScrapegraphAI Official Website (for API details and account management)
- n8n Documentation on Credentials