Actions11
- Actor Actions
- Actor Task Actions
- Actor Run Actions
- Dataset Actions
- Key-Value Store Actions
Overview
This node integrates with Apify, a platform for web scraping, data extraction, and automation. Specifically, the "Scrape Single URL" operation under the "Actor" resource allows users to scrape data from a single web page URL using different crawler technologies. This is useful for quickly extracting structured data from any publicly accessible webpage without building a full scraper from scratch.
Common scenarios include:
- Extracting product details from an e-commerce page.
- Collecting article content or metadata from news sites.
- Gathering social media profile information.
- Testing or prototyping web scraping workflows on individual URLs.
Practical example: You want to scrape the title, images, and price of a product from a competitor’s website by providing its URL and selecting a crawler type that best fits the page complexity.
Properties
| Name | Meaning |
|---|---|
| URL | The web address to be scraped. Must start with http:// or https:// and be a valid URL. |
| Crawler Type | The technology used to scrape the page. Options are: Cheerio (fast HTML parsing), JSDOM (DOM emulation), Playwright Adaptive (headless browser with adaptive behavior), and Playwright Firefox (headless Firefox browser). |
| Authentication | Method to authenticate API requests to Apify. Options are: API Key or OAuth2. |
Output
The node outputs JSON data representing the scraped content from the specified URL. The exact structure depends on the actor's scraping logic but typically includes extracted fields such as text, links, images, or other relevant data points.
If the actor supports binary data (e.g., screenshots or downloaded files), the node can output this in the binary property of the item, allowing further processing or saving.
Dependencies
- Requires an active Apify account and either an API key or OAuth2 token configured in n8n credentials.
- Internet access to reach the target URL and Apify services.
- No additional external dependencies beyond the Apify platform.
Troubleshooting
- Invalid URL error: Ensure the URL starts with
http://orhttps://and is correctly formatted. - Authentication failures: Verify that the API key or OAuth2 token is valid and has sufficient permissions.
- Crawler type issues: Some pages may require a more advanced crawler like Playwright if Cheerio or JSDOM cannot handle JavaScript-rendered content.
- Timeouts or no data returned: The target page might be blocking scraping or requires additional headers/cookies not supported by default.
- Rate limiting: Apify enforces usage limits; exceeding these will cause errors. Check your plan and usage.