BrightData icon

BrightData

Interact with Bright Data to scrape websites or use existing datasets from the marketplace to generate adapted snapshots

Overview

The node integrates with Bright Data's web scraping and data marketplace services. Specifically, the Web Scraper - Scrape By URL operation allows users to scrape data from specified URLs using a predefined dataset configuration from Bright Data's marketplace. This is useful for extracting structured data from web pages without manually coding scrapers.

Common scenarios include:

  • Automatically collecting profile information from social media or professional networking sites.
  • Extracting product details or pricing from e-commerce websites.
  • Gathering news articles or blog posts content for analysis.

For example, a user can select a dataset tailored for LinkedIn profiles and provide a list of LinkedIn URLs to scrape detailed profile data in JSON or CSV format.

Properties

Name Meaning
Dataset Select a predefined dataset from Bright Data's marketplace to use as the scraping template.
URLs A JSON array of URLs to scrape data from, e.g., [{"url":"https://example.com"}].
Include Errors Boolean flag indicating whether to include error details in the response if scraping fails.
Format The output data format: either JSON or CSV.

Output

The node outputs scraped data in the chosen format (json or csv). The main output field is json, which contains an array of objects representing the extracted data per URL.

If Include Errors is enabled, the output may also contain error information related to failed scraping attempts.

No binary data output is indicated by the code or properties.

Dependencies

  • Requires an API key credential for authenticating with Bright Data's API.
  • Depends on Bright Data's web scraping and dataset marketplace services.
  • The node makes HTTP requests to https://api.brightdata.com.
  • No additional environment variables are explicitly required beyond the API credential.

Troubleshooting

  • Common issues:

    • Invalid or expired API credentials will cause authentication failures.
    • Providing malformed URLs or unsupported website structures may result in empty or error responses.
    • Selecting a dataset incompatible with the target URLs can lead to incomplete or incorrect data extraction.
  • Error messages:

    • Authentication errors typically indicate invalid API keys; verify and update credentials.
    • HTTP status errors are ignored internally but may appear in the response if Include Errors is true.
    • Parsing errors in the URLs JSON input should be corrected by ensuring valid JSON syntax.

Links and References

Discussion