FireCrawl

FireCrawl API

Actions6

Overview

The Crawl Url With Websocket Monitoring operation of the FireCrawl node allows you to crawl a specified URL, monitor the crawling process via WebSocket, and retrieve structured or formatted data from the crawled pages. This is particularly useful for scenarios where you need to extract content or metadata from websites in bulk, automate web scraping tasks, or gather data for analysis and reporting.

Practical examples:

Automatically scrape product information from e-commerce sites.
Collect articles or blog posts for content aggregation.
Extract structured data (e.g., using a schema) from multiple web pages for research or business intelligence.

Properties

Below are the supported input properties for this operation:

Display Name	Type	Description
Url	String	The URL to crawl.
Exclude Paths	Collection	List of paths to exclude from the crawl. Useful for skipping irrelevant or sensitive areas.
Limit	Number	Max number of results to return. Controls the breadth of the crawl.
Scrape Options	Collection	Scraping options, including output formats and extraction settings.
Formats	Multi-Options	Output format(s) for the scraped data: Markdown, Html, Extract.
Extract	Collection	Structured extraction options:
Schema	String	The schema for structured data extraction.
Systemprompt	String	The system prompt used for extraction.
Prompt	String	Extraction prompt without schema.
Use Custom Body	Boolean	Whether to use a custom body for the request.
Custom Body	JSON	Custom body to send. Allows advanced users to specify the entire request payload as JSON.

Output

The node outputs a json field containing the results of the crawl. The structure of this output will depend on the selected scrape options and formats, but typically includes:

The crawled data in the requested format(s) (Markdown, HTML, or extracted fields).
If extraction is configured, the output may include structured data according to the provided schema or prompts.

Note: If binary data is returned (e.g., files or images), it will be included in the binary output section, though this operation primarily focuses on JSON/textual data.

Dependencies

External Service: Requires access to the FireCrawl API.
API Key/Credentials: You must configure the "FireCrawl API" credentials in n8n, including the base URL (baseUrl) and any required authentication tokens.
n8n Configuration: Ensure that your n8n instance can reach the FireCrawl API endpoint.

Troubleshooting

Common issues:

Invalid Credentials: If the API key or base URL is incorrect, you may receive authentication errors. Double-check your FireCrawl API credentials in n8n.
Malformed Custom Body: If you enable "Use Custom Body" and provide invalid JSON, the node may throw a parsing error. Ensure your JSON is well-formed.
URL Not Reachable: If the target URL cannot be accessed by the FireCrawl service, the crawl will fail. Verify the URL and network accessibility.
Exceeded Limit: If you set an excessively high limit, the API may reject the request or throttle responses.

Error messages and resolutions:

"401 Unauthorized": Check your API credentials.
"400 Bad Request": Review your input parameters, especially the custom body if used.
"Connection Timeout": Ensure the target URL is accessible and not blocking requests.

Links and References

n8n Documentation
FireCrawl API Documentation (if available)
Web Scraping Best Practices