Actions7
Overview
The node integrates with the Firecrawl API to scrape web pages and extract their content in various formats. It is designed to fetch and process data from URLs, supporting advanced options like PDF parsing, content filtering by HTML tags, dynamic page interactions (clicks, scrolls, waits), and screenshot capture. This node is useful for scenarios such as web data extraction, monitoring website changes, archiving content, or automating data collection workflows.
Practical examples:
- Scraping a news article URL to extract the main text in markdown format.
- Downloading and converting PDF content from a URL into markdown.
- Capturing a full-page screenshot of a product page.
- Tracking changes on a webpage using change tracking formats.
- Interacting with dynamic content by clicking buttons or scrolling before scraping.
Properties
| Name | Meaning |
|---|---|
| Url | The URL of the webpage to scrape. |
| Parsers | Controls PDF processing during scraping. Options: PDF - when enabled, extracts PDF content as markdown with billing per page; when disabled, returns PDF as base64 with flat billing. |
| Scrape Options | Options controlling how content is scraped and output formats. Includes: - Formats: Output formats such as markdown, html, json, links, rawHtml, screenshot, summary, change tracking. - For JSON and change tracking formats, you can specify prompts, JSON schemas, modes (e.g., git-diff, json), and tags. - Screenshot options include full page capture, quality (1-100), viewport width and height. These allow fine-tuning the output data structure and presentation. |
| Only Main Content | Whether to return only the main content of the page, excluding headers, navigation bars, footers, etc. |
| Include Tags | Specifies a list of HTML tags to include in the output. For example, including "header" will keep header elements in the scraped content. |
| Exclude Tags | Specifies a list of HTML tags to exclude from the output. For example, excluding "footer" will remove footer elements from the scraped content. |
| Headers | Custom HTTP headers to send with the scraping request. Each header has a key and value. Useful for authentication or custom user-agent strings. |
| Wait For (Ms) | Number of milliseconds to wait after page load before fetching content. Useful for pages that load content dynamically. |
| Mobile | Whether to emulate a mobile device when scraping. This affects user-agent and possibly page layout. |
| Skip TLS Verification | Whether to skip TLS certificate verification when making requests. Useful for self-signed certificates or testing environments. |
| Timeout (Ms) | Request timeout in milliseconds. Defines how long to wait for the page to respond before aborting. |
| Actions | A list of actions to perform on the page before scraping, enabling interaction with dynamic content. Actions include: - Click: Click an element specified by CSS selector. - Press: Press a keyboard key. - Screenshot: Take a screenshot (full page or viewport). - Scroll: Scroll up or down. - Wait: Wait for specified milliseconds. - Write: Type text into an input field. This allows automation of complex scraping scenarios involving user interaction. |
| Location | Settings to specify geographic location context for the request: - Country: ISO 3166-1 alpha-2 country code (e.g., US, AU, DE, JP). - Languages: Preferred languages/locales for the request in priority order (e.g., en, fr, de). This affects Accept-Language headers and possibly localized content. |
| Remove Base64 Images | Whether to remove base64 encoded images from the output. Helps reduce output size if embedded images are not needed. |
| Block Ads | Enables ad-blocking and cookie popup blocking during scraping to avoid unwanted content. |
| Store In Cache | Whether to store the scraped page in Firecrawl's index and cache. Disable this for privacy or sensitive data concerns. |
| Proxy | Specifies the proxy type to use for requests. Options: - Basic - Stealth |
| Additional Fields | Allows sending custom JSON properties in the request body for advanced or future features. |
| Use Custom Body | When enabled, allows providing a fully custom request body instead of using the standard parameters. |
Output
The node outputs JSON data representing the scraped content according to the requested formats. The structure depends on the selected output formats:
- Markdown/HTML/Raw HTML: Textual content extracted from the page.
- JSON: Structured data extracted based on provided JSON schema and prompt.
- Links: List of links found on the page.
- Summary: A summarized version of the page content.
- Change Tracking: Data showing differences between current and previous scrapes, supporting modes like git-diff or JSON diff.
- Screenshot: Binary image data representing a screenshot of the page, either full page or viewport sized.
If PDF parsing is enabled, PDF content is converted to markdown; otherwise, PDFs are returned as base64 encoded files.
Binary data (screenshots) is included as binary attachments in the output.
Dependencies
- Requires an API key credential for authenticating with the Firecrawl API.
- Network access to the Firecrawl API endpoint (default https://api.firecrawl.dev/v2).
- Optional proxy configuration depending on network environment.
- No other external dependencies are required.
Troubleshooting
- Timeout errors: Increase the "Timeout (Ms)" property if pages take longer to load.
- Empty or incomplete content: Try enabling "Wait For (Ms)" to allow dynamic content to load, or add appropriate "Actions" to interact with the page.
- TLS errors: Enable "Skip TLS Verification" if connecting to sites with invalid certificates.
- Incorrect content due to localization: Adjust "Location" settings to specify correct country and language preferences.
- PDF content not parsed as expected: Ensure "Parsers" includes PDF if you want markdown extraction; otherwise, PDFs will be base64 encoded.
- Ad or cookie popups interfering: Enable "Block Ads" to reduce unwanted overlays.
- Issues with dynamic content: Use "Actions" to simulate clicks, scrolls, or typing before scraping.
- Proxy connection failures: Verify proxy settings and credentials if used.
Links and References
- Firecrawl API documentation: https://firecrawl.dev/docs/api
- MDN Web Docs on Accept-Language header: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Accept-Language