BrightData icon

BrightData

Interact with Bright Data to scrape websites or use existing datasets from the marketplace to generate adapted snapshots

Overview

The "Trigger Collection By URL" operation in the Web Scraper resource allows users to initiate a web scraping snapshot on specified URLs using a selected dataset from Bright Data's marketplace. This node is useful for automating data extraction tasks where you want to trigger data collection jobs remotely and receive the scraped data or notifications upon completion.

Typical use cases include:

  • Automatically triggering data snapshots for monitoring competitor websites.
  • Collecting updated information from multiple URLs without manual intervention.
  • Integrating with other workflows that process or analyze scraped data once available.

For example, you can select a dataset configured for LinkedIn profiles and provide a list of LinkedIn profile URLs to trigger scraping those pages. The scraped data will be sent to a specified endpoint, and optionally, a notification URL can be called when the collection finishes.

Properties

Name Meaning
Dataset Select the dataset from Bright Data's marketplace to use for scraping. You can search and pick from available datasets.
URLs JSON array of URLs to trigger snapshots on. Each item should contain a url field specifying the target webpage(s). Example: [{"url":"https://www.linkedin.com/in/bulentakar"}]
Endpoint The HTTP endpoint URL where the scraped data obtained from the snapshot will be sent.
Notify The HTTP URL to notify (e.g., webhook) when the collection job has finished.

Output

The node outputs JSON data representing the response from the Bright Data API after triggering the collection. This typically includes metadata about the triggered snapshot job such as job IDs, status, or confirmation messages.

If the scraping results are sent asynchronously to the specified endpoint, this node itself does not output the scraped content directly but confirms the trigger action.

No binary data output is expected from this operation.

Dependencies

  • Requires an API key credential for authenticating with Bright Data's API.
  • Needs network access to Bright Data's API endpoints.
  • The user must have access to datasets in Bright Data's marketplace.
  • The specified Endpoint and Notify URLs must be reachable and able to accept HTTP requests.

Troubleshooting

  • Invalid Dataset Selection: If the dataset ID is incorrect or inaccessible, the API call will fail. Ensure the dataset exists and your credentials have permission to access it.
  • Malformed URLs JSON: The URLs property must be valid JSON with correct structure. Invalid JSON or missing url fields will cause errors.
  • Endpoint/Notify URL Issues: If the provided endpoint or notify URLs are unreachable or reject requests, data delivery or notifications will fail. Verify these URLs are correct and accessible.
  • API Authentication Errors: Missing or invalid API credentials will prevent the node from triggering collections. Confirm the API key is correctly configured.
  • Network Errors: Connectivity issues to Bright Data's API or the specified endpoints can cause timeouts or failures.

Common error messages usually relate to authentication failures, invalid parameters, or network timeouts. Checking the node's execution logs and verifying all inputs and credentials often resolves these issues.

Links and References

Discussion