BrightData icon

BrightData

Interact with Bright Data to scrape websites or use existing datasets from the marketplace to generate adapted snapshots

Overview

The "Filter Dataset" operation within the "Marketplace Dataset" resource allows users to query and filter records from a selected dataset available in the Bright Data marketplace. This node is useful when you want to extract specific subsets of data based on certain criteria without downloading or processing the entire dataset.

Typical use cases include:

  • Extracting records where a field matches a particular value (e.g., filtering companies by industry).
  • Applying multiple filters combined with logical AND to refine dataset results.
  • Limiting the number of returned records for performance or sampling purposes.

For example, you could filter a dataset of business listings to only include those in the "Advertising" industry or filter user data where age is greater than 30 and location equals "New York".

Properties

Name Meaning
Dataset Select the dataset from the Bright Data marketplace to filter. The selection is made from a searchable list of available datasets.
Records Limit Maximum number of records to return in the filtered snapshot. Useful to limit output size. Default is 100.
Filter Type Choose the type of filter to apply:
- Single Filter: One simple condition.
- Group Filters: Multiple conditions combined with logical AND.
Note: CSV and JSON filter types are also supported but not detailed here.
Field Name (Single Filter only) The name of the field in the dataset to filter on.
Operator (Single Filter only) The operator to apply between the field and the value. Options include:
Equals, Not Equals, Greater Than, Less Than, Includes, In, Array Includes, and their negations.
Field Value (Single Filter only) The value to compare the field against using the chosen operator.
Filters Group (Group Filters only) A JSON object defining complex filter logic. Supports nested filters combined with "and" operators. Example:
json { "operator": "and", "filters": [ { "name": "age", "operator": ">", "value": "30" }, { "name": "city", "operator": "==", "value": "London" } ] }

Output

The node outputs a JSON array of records matching the filter criteria from the selected dataset. Each record corresponds to an entry in the dataset and contains fields as defined by that dataset's schema.

If binary data is present in the dataset (not indicated here), it would be included accordingly, but this node primarily deals with JSON data representing filtered dataset entries.

Dependencies

  • Requires an API key credential for authenticating with the Bright Data platform.
  • Access to the Bright Data Marketplace datasets via their API.
  • Network connectivity to https://api.brightdata.com.

Troubleshooting

  • Common Issues:

    • Invalid dataset ID or dataset not accessible: Ensure the selected dataset exists and your API key has permissions.
    • Malformed filter JSON in "Filters Group": Validate JSON syntax and structure before running.
    • Unsupported operators or field names: Confirm the field names and operators match those supported by the dataset schema.
    • Exceeding records limit: If too many records requested, the API might throttle or reject; reduce the "Records Limit".
  • Error Messages:

    • Authentication errors: Check API key validity and permissions.
    • Syntax errors in filter JSON: Correct JSON formatting and ensure required keys (name, operator, value) are present.
    • No records found: Adjust filter criteria or verify dataset contents.

Links and References

Discussion