Actions6
Overview
The FireCrawl node's "Submit A Crawl Job" operation allows you to initiate a web crawling job by specifying a target URL and various crawl/scraping options. This is useful for scenarios where you need to programmatically extract content or structured data from websites, such as:
- Collecting articles or product information from a set of web pages.
- Monitoring website changes.
- Extracting structured data using custom prompts or schemas.
Practical Example:
You could use this node to crawl a blog, exclude certain paths (like /about or /contact), and receive the results in Markdown or HTML format, optionally triggering a webhook when the crawl completes.
Properties
Below are the supported input properties for this operation:
| Display Name | Type | Description |
|---|---|---|
| Url | String (required) | The URL to crawl. |
| Limit | Number | Max number of results to return. Minimum value: 1. Default: 50. |
| Exclude Paths | Collection | List of paths to exclude from the crawl (e.g., /about, /login). |
| Allow Backward Links | Boolean | Allow crawling pages that are not direct descendants of the initial URL. Default: true. |
| Webhook | String | URL to send webhook events during the crawl process. |
| Scrape Options | Collection | Scraping options, including output formats and extraction settings: |
- Formats: Output format(s) for scraped data (Markdown, Html, Extract). |
||
| - Extract: Structured extraction options: | ||
| - Schema: Schema for structured data extraction. | ||
| - Systemprompt: System prompt used for extraction. | ||
| - Prompt: Extraction prompt without schema. | ||
| Use Custom Body | Boolean | Whether to use a custom body for the request. If enabled, only the "Custom Body" property is used. |
| Custom Body | JSON | Custom body to send. Allows full control over the request payload. |
Output
The node returns a JSON object containing the response from the FireCrawl API after submitting the crawl job. The structure of the output depends on the API's response, but typically includes:
{
"jobId": "string",
"status": "submitted",
"message": "Job successfully created",
// ...other fields as returned by the API
}
- If binary data is ever returned, it would represent crawled files or extracted assets, but this operation primarily outputs JSON.
Dependencies
- External Service: Requires access to the FireCrawl API.
- API Key/Credentials: You must configure the "FireCrawl API" credentials in n8n, including the
baseUrland authentication details. - n8n Configuration: Ensure the node has network access to the FireCrawl API endpoint.
Troubleshooting
Common Issues:
Missing or Invalid Credentials:
Error:401 Unauthorizedor similar.
Resolution: Check your FireCrawl API credentials in n8n.Invalid URL or Parameters:
Error:400 Bad Requestor validation errors.
Resolution: Ensure the "Url" field is correctly filled and all required parameters are valid.API Endpoint Unreachable:
Error:ENOTFOUND,ECONNREFUSED, or timeout errors.
Resolution: Verify thebaseUrlin your credentials and ensure network connectivity.Malformed Custom Body:
Error: JSON parsing error or unexpected API response.
Resolution: Double-check the syntax and structure of your custom JSON body.
Links and References
- n8n Documentation
- FireCrawl API documentation (refer to your service provider for the exact link)