Actions7
Overview
The node "ScrapegraphAI" provides a powerful interface to perform web data extraction tasks using the ScrapegraphAI API. Specifically, the Search Scraper - Search operation allows users to submit a search query or instructions and retrieve structured data extracted from multiple websites returned by that search.
This operation is useful when you want to automate gathering information from search results across various websites without manually visiting each one. For example, you could use it to:
- Collect recent news articles on a topic.
- Aggregate product reviews or prices from e-commerce sites.
- Extract research papers or blog posts related to a keyword.
- Perform competitive analysis by scraping competitor websites found via search.
The node supports options for controlling how many websites to scrape, whether to enable infinite scrolling or pagination to load more content, and even defining a custom JSON schema to structure the output data precisely.
Properties
| Name | Meaning |
|---|---|
| User Prompt | The search query or instructions describing what to search for and extract from the websites. |
| Use Custom Output Schema | Whether to apply a user-defined JSON schema to structure the output data. If enabled, the output will conform to the provided schema. |
| Output Schema | A JSON schema definition specifying the structure, types, and descriptions of the expected output data. Only shown if "Use Custom Output Schema" is enabled. |
| Number of Results | How many websites to search and scrape. Options: Standard (3), Enhanced (5), Deep Research (10), Maximum (20). Each option corresponds to different credit costs. |
| Enable Infinite Scrolling | Whether to enable infinite scrolling on the search result pages to load additional content dynamically by scrolling down. |
| Number of Scrolls | Number of times to scroll down the page to load more content when infinite scrolling is enabled. |
| Enable Pagination | Whether to enable pagination to scrape multiple pages of search results. |
| Total Pages | Total number of pages to scrape when pagination is enabled. |
Output
The output is a JSON object containing the scraped data from the search results. By default, the structure depends on the API response but can be customized with a JSON schema.
If a custom output schema is used, the output will follow that schema. For example, the default example schema includes:
articles: an array of article objects, each having:title(string): Article title.author(string): Author name.publishDate(string, optional): Publication date.
The node outputs this structured data in the json field of the item.
No binary data output is indicated for this operation.
Dependencies
- Requires an active connection to the ScrapegraphAI API via an API key credential configured in n8n.
- The node makes authenticated HTTP POST requests to the ScrapegraphAI endpoint
/searchscraper. - Proper internet access and valid API credentials are necessary.
Troubleshooting
- Invalid JSON in Output Schema: If you enable "Use Custom Output Schema" and provide invalid JSON, the node will throw an error indicating the JSON parsing issue. To fix, ensure your JSON schema is correctly formatted.
- API Authentication Errors: If the API key is missing or invalid, requests will fail. Verify your API credentials are set up correctly in n8n.
- Exceeding Credit Limits: Selecting a high number of results (e.g., 20 websites) consumes more credits. Ensure your account has sufficient credits to avoid request failures.
- Empty or Unexpected Output: If the search query yields no results or the website structure changes, the output may be empty or incomplete. Adjust the user prompt or check the API status.
- Network Issues: Connectivity problems can cause request timeouts or failures. Check your network and retry.
Links and References
This summary is based solely on static code analysis of the provided source and property definitions.