Tavily

Tavily API

Actions3

Search Actions
- Query
Extract Actions
- URLs
Crawl Actions
- URL

Overview

This node extracts content from a list of provided URLs. It is useful for scenarios where you want to programmatically retrieve and process web page data, such as scraping articles, gathering metadata, or collecting images from multiple websites. For example, you could use this node to extract the main text content of blog posts, fetch embedded tables or images from product pages, or obtain favicons for branding purposes.

Properties

Name	Meaning
URLs	A list of URLs to extract content from. You can add multiple URLs to process in one execution.
Include Images	Whether to include a list of images extracted from the URLs. Default is false.
Extract Depth	The depth of the extraction process: "Basic" retrieves standard content; "Advanced" extracts more data including tables and embedded content but may increase latency.
Format	The format of the extracted content: "Markdown" returns markdown formatted content; "Text" returns plain text (may increase latency).
Include Favicon	Whether to include the favicon URL for each result. Default is false.

Output

The output JSON contains the extracted content from each URL processed. Depending on the options selected, it may include:

The main content of the web page in either markdown or plain text format.
A list of images found on the page if "Include Images" is enabled.
The favicon URL of the website if "Include Favicon" is enabled.
Additional extracted elements such as tables or embedded content when using the advanced extraction depth.

If binary data were involved (e.g., image files), it would be included accordingly, but based on the properties, the node primarily outputs structured JSON with URLs and their extracted content.

Dependencies

Requires internet access to fetch the content from the specified URLs.
May require an API key or authentication token depending on the underlying extraction service used (not explicitly shown in the source).
No additional environment variables are indicated by the static code.

Troubleshooting

Common issues:
- Invalid or unreachable URLs will cause extraction failures.
- Selecting "Advanced" extraction depth may increase processing time and lead to timeouts on slow or large pages.
- Enabling "Include Images" or "Include Favicon" may fail if the target site restricts access or uses anti-scraping measures.
Error messages:
- Errors related to network connectivity or invalid URLs should be checked by verifying the URLs and network status.
- Extraction errors might indicate unsupported page structures or rate limiting by the target sites.

Links and References

Markdown Guide
General web scraping best practices and legal considerations: https://en.wikipedia.org/wiki/Web_scraping