ScrapegraphAI icon

ScrapegraphAI

Turn any webpage into usable data in one shot – ScrapegraphAI explores the website and extracts the content you need.

Actions7

Overview

The Smart Crawler - Get Status operation in this node allows users to retrieve the current status of a previously initiated web crawling task. After starting a crawl job (using the "crawl" operation), you receive a task ID. This operation queries the external scraping service with that task ID to check progress, results, or errors related to the crawl.

This is useful in scenarios where web data extraction requires asynchronous processing due to complexity or volume. For example, if you start a crawl on a large website, you can periodically check the status and obtain partial or final results once ready.

Practical examples:

  • Monitoring the progress of a deep crawl on an e-commerce site.
  • Polling for completion of a scheduled content extraction job.
  • Retrieving crawl results after a delay without blocking workflow execution.

Properties

Name Meaning
Task ID The unique identifier of the crawl task returned from a previous crawl operation. Used to query the status of that specific crawl job.

Output

The output is a JSON object representing the status and possibly the results of the crawl task identified by the provided Task ID. It typically includes fields such as:

  • Current state of the crawl (e.g., pending, running, completed, failed).
  • Any extracted data or metadata available at the time of the request.
  • Error messages or warnings if the crawl encountered issues.
  • Progress indicators like number of pages crawled.

No binary data output is indicated for this operation.

Dependencies

  • Requires an active API key credential for the external ScrapegraphAI service.
  • The node makes authenticated HTTP GET requests to the endpoint:
    https://api.scrapegraphai.com/v1/crawl/{taskId}
  • Proper network connectivity to the ScrapegraphAI API is necessary.

Troubleshooting

  • Invalid Task ID: If the provided Task ID does not exist or is malformed, the API may return an error. Verify the Task ID was copied correctly from the crawl initiation response.
  • Authentication Errors: Missing or invalid API credentials will cause authentication failures. Ensure the API key credential is configured properly in n8n.
  • Network Issues: Timeouts or connection errors indicate network problems between n8n and the API endpoint.
  • API Rate Limits: Excessive polling frequency might trigger rate limiting. Implement delays between status checks.
  • JSON Parsing Errors: Although unlikely here, any unexpected response format could cause parsing issues.

To resolve errors, confirm all input parameters, credentials, and network conditions are correct. Review error messages returned in the node output for guidance.

Links and References

Discussion