Actions6
Overview
The Crawl Url With Websocket Monitoring operation of the FireCrawl node allows you to crawl a specified URL, monitor the crawling process via WebSocket, and retrieve structured or formatted data from the crawled pages. This is particularly useful for scenarios where you need to extract content or metadata from websites in bulk, automate web scraping tasks, or gather data for analysis and reporting.
Practical examples:
- Automatically scrape product information from e-commerce sites.
- Collect articles or blog posts for content aggregation.
- Extract structured data (e.g., using a schema) from multiple web pages for research or business intelligence.
Properties
Below are the supported input properties for this operation:
| Display Name | Type | Description |
|---|---|---|
| Url | String | The URL to crawl. |
| Exclude Paths | Collection | List of paths to exclude from the crawl. Useful for skipping irrelevant or sensitive areas. |
| Limit | Number | Max number of results to return. Controls the breadth of the crawl. |
| Scrape Options | Collection | Scraping options, including output formats and extraction settings. |
| Formats | Multi-Options | Output format(s) for the scraped data: Markdown, Html, Extract. |
| Extract | Collection | Structured extraction options: |
| Schema | String | The schema for structured data extraction. |
| Systemprompt | String | The system prompt used for extraction. |
| Prompt | String | Extraction prompt without schema. |
| Use Custom Body | Boolean | Whether to use a custom body for the request. |
| Custom Body | JSON | Custom body to send. Allows advanced users to specify the entire request payload as JSON. |
Output
The node outputs a json field containing the results of the crawl. The structure of this output will depend on the selected scrape options and formats, but typically includes:
- The crawled data in the requested format(s) (Markdown, HTML, or extracted fields).
- If extraction is configured, the output may include structured data according to the provided schema or prompts.
Note: If binary data is returned (e.g., files or images), it will be included in the binary output section, though this operation primarily focuses on JSON/textual data.
Dependencies
- External Service: Requires access to the FireCrawl API.
- API Key/Credentials: You must configure the "FireCrawl API" credentials in n8n, including the base URL (
baseUrl) and any required authentication tokens. - n8n Configuration: Ensure that your n8n instance can reach the FireCrawl API endpoint.
Troubleshooting
Common issues:
- Invalid Credentials: If the API key or base URL is incorrect, you may receive authentication errors. Double-check your FireCrawl API credentials in n8n.
- Malformed Custom Body: If you enable "Use Custom Body" and provide invalid JSON, the node may throw a parsing error. Ensure your JSON is well-formed.
- URL Not Reachable: If the target URL cannot be accessed by the FireCrawl service, the crawl will fail. Verify the URL and network accessibility.
- Exceeded Limit: If you set an excessively high limit, the API may reject the request or throttle responses.
Error messages and resolutions:
- "401 Unauthorized": Check your API credentials.
- "400 Bad Request": Review your input parameters, especially the custom body if used.
- "Connection Timeout": Ensure the target URL is accessible and not blocking requests.