BrightData icon

BrightData

Interact with Bright Data to scrape websites or use existing datasets from the marketplace to generate adapted snapshots

Overview

The node interacts with Bright Data's Marketplace Dataset service to retrieve the content of a specific snapshot. It allows users to fetch data in batches, supporting different formats and optional gzip compression. This is useful when working with large datasets that are split into multiple parts (batches), enabling efficient incremental data retrieval.

Common scenarios include:

  • Downloading large marketplace dataset snapshots in manageable chunks.
  • Fetching data in preferred formats such as JSON, JSONL, or CSV for further processing.
  • Compressing responses to reduce bandwidth usage.

Practical example:
A user wants to process a large dataset snapshot but cannot load it all at once due to memory constraints. They can use this node to fetch the snapshot content batch by batch, specifying the batch size and part number, optionally compressing the response to save bandwidth.

Properties

Name Meaning
Snapshot ID The unique identifier of the snapshot to retrieve content from.
Compress Whether to compress the response using gzip format (true or false).
Batch Size Number of records to include in each response batch (e.g., 1000).
Part The batch number to return, starting from 1 (e.g., 1 for the first batch).
Format Format of the response data. Options: JSON, JSONL, CSV.

Output

The node outputs the snapshot content in the specified format (json, jsonl, or csv) as JSON data in the json output field. If compression is enabled, the response will be gzip compressed, which typically means the binary data would be handled accordingly (though the exact handling of binary data is not detailed here).

The output structure corresponds to the requested batch of records from the snapshot, allowing incremental processing of large datasets.

Dependencies

  • Requires an API key credential for authenticating with Bright Data's API.
  • Connects to the Bright Data API endpoint at https://api.brightdata.com.
  • No additional external dependencies beyond the API access and proper configuration of credentials within n8n.

Troubleshooting

  • Invalid Snapshot ID: If the provided snapshot ID does not exist or is incorrect, the API may return an error or empty data. Verify the snapshot ID is correct.
  • Batch Size Too Large: Requesting very large batch sizes might cause timeouts or performance issues. Adjust batch size to a reasonable number.
  • Part Number Out of Range: Requesting a part number beyond the available batches will likely return no data or an error. Ensure the part number is within valid range.
  • Compression Issues: If compression is enabled but downstream nodes do not handle gzip data properly, errors may occur. Disable compression if unsure.
  • API Authentication Errors: Ensure the API key credential is correctly configured and has necessary permissions.

Links and References

Discussion