Firecrawl icon

Firecrawl

Get data from Firecrawl API

Overview

This node integrates with the Firecrawl API to map a website and retrieve URLs found during crawling. It is useful for scenarios where you want to analyze the structure of a website, gather all accessible links, or extract URLs for further processing such as SEO analysis, content auditing, or automated testing.

For example, you can input a website URL and configure the node to crawl the site including or excluding its sitemap, limit the number of URLs returned, and decide whether to include subdomains. This helps automate the collection of comprehensive link data from websites without manual browsing.

Properties

Name Meaning
Url The starting URL of the website to crawl.
Sitemap How to handle the website sitemap during crawling: "Include" (default), "Only", or "Skip".
Include Subdomains Whether to include subdomains of the website in the crawl (true/false).
Limit Maximum number of URLs to return from the crawl (1 to 5000).
Timeout (Ms) Timeout in milliseconds for the crawling request.
Use Custom Body Whether to use a custom JSON body for the request instead of the standard parameters.
Additional Fields When using a custom body, allows adding extra JSON properties to the request body.

Output

The node outputs JSON data containing the results of the website mapping operation. The main output field json will typically include an array of URLs discovered during the crawl, along with metadata about each URL if provided by the API.

If binary data were involved (e.g., downloading files), it would be summarized here, but this node focuses on JSON URL data.

Dependencies

  • Requires an API key credential for authenticating with the Firecrawl API.
  • The base URL for the API defaults to https://api.firecrawl.dev/v2 but can be overridden via credentials.
  • No other external dependencies are indicated.

Troubleshooting

  • Timeouts: If the crawl takes too long, increase the "Timeout (Ms)" property to allow more time for the request.
  • Limit Exceeded: Setting the "Limit" too high may cause performance issues or API rejections; keep it within reasonable bounds.
  • Invalid URL: Ensure the "Url" property is a valid and reachable website address.
  • API Authentication Errors: Verify that the API key credential is correctly configured and has necessary permissions.
  • Custom Body Misconfiguration: When using a custom body, ensure the JSON is well-formed and includes required fields expected by the API.

Links and References

Discussion