FireCrawl icon

FireCrawl

FireCrawl API

Overview

The FireCrawl node's "Submit A Crawl Job" operation allows you to initiate a web crawling job by specifying a target URL and various crawl/scraping options. This is useful for scenarios where you need to programmatically extract content or structured data from websites, such as:

  • Collecting articles or product information from a set of web pages.
  • Monitoring website changes.
  • Extracting structured data using custom prompts or schemas.

Practical Example:
You could use this node to crawl a blog, exclude certain paths (like /about or /contact), and receive the results in Markdown or HTML format, optionally triggering a webhook when the crawl completes.


Properties

Below are the supported input properties for this operation:

Display Name Type Description
Url String (required) The URL to crawl.
Limit Number Max number of results to return. Minimum value: 1. Default: 50.
Exclude Paths Collection List of paths to exclude from the crawl (e.g., /about, /login).
Allow Backward Links Boolean Allow crawling pages that are not direct descendants of the initial URL. Default: true.
Webhook String URL to send webhook events during the crawl process.
Scrape Options Collection Scraping options, including output formats and extraction settings:
- Formats: Output format(s) for scraped data (Markdown, Html, Extract).
- Extract: Structured extraction options:
- Schema: Schema for structured data extraction.
- Systemprompt: System prompt used for extraction.
- Prompt: Extraction prompt without schema.
Use Custom Body Boolean Whether to use a custom body for the request. If enabled, only the "Custom Body" property is used.
Custom Body JSON Custom body to send. Allows full control over the request payload.

Output

The node returns a JSON object containing the response from the FireCrawl API after submitting the crawl job. The structure of the output depends on the API's response, but typically includes:

{
  "jobId": "string",
  "status": "submitted",
  "message": "Job successfully created",
  // ...other fields as returned by the API
}
  • If binary data is ever returned, it would represent crawled files or extracted assets, but this operation primarily outputs JSON.

Dependencies

  • External Service: Requires access to the FireCrawl API.
  • API Key/Credentials: You must configure the "FireCrawl API" credentials in n8n, including the baseUrl and authentication details.
  • n8n Configuration: Ensure the node has network access to the FireCrawl API endpoint.

Troubleshooting

Common Issues:

  • Missing or Invalid Credentials:
    Error: 401 Unauthorized or similar.
    Resolution: Check your FireCrawl API credentials in n8n.

  • Invalid URL or Parameters:
    Error: 400 Bad Request or validation errors.
    Resolution: Ensure the "Url" field is correctly filled and all required parameters are valid.

  • API Endpoint Unreachable:
    Error: ENOTFOUND, ECONNREFUSED, or timeout errors.
    Resolution: Verify the baseUrl in your credentials and ensure network connectivity.

  • Malformed Custom Body:
    Error: JSON parsing error or unexpected API response.
    Resolution: Double-check the syntax and structure of your custom JSON body.


Links and References

  • n8n Documentation
  • FireCrawl API documentation (refer to your service provider for the exact link)

Discussion