Actions14
- Marketplace Dataset Actions
- Web Scraper Actions
- Web Unlocker Actions
Overview
The "Deliver Snapshot" operation of the Web Scraper resource in this node enables users to deliver a previously generated snapshot (a saved state or result of a web scraping task) to various external storage or messaging services. This operation is useful for automating the distribution and storage of scraped data, allowing integration with cloud storage providers, messaging systems, or custom endpoints.
Common scenarios include:
- Automatically uploading scraped data snapshots to cloud storage like Amazon S3, Google Cloud Storage, Azure Blob Storage, or Aliyun OSS.
- Sending snapshot data as messages to Google Cloud PubSub topics for downstream processing.
- Delivering snapshots via SFTP to remote servers.
- Posting snapshot data to a webhook endpoint for real-time processing or notification.
- Loading snapshot data into Snowflake data warehouse tables.
Practical examples:
- After scraping product prices from an e-commerce site, automatically upload the snapshot JSON file to an S3 bucket for archival.
- Send snapshot data to a Google PubSub topic to trigger further data processing pipelines.
- Deliver snapshots to an SFTP server for integration with legacy systems.
- Notify a monitoring system by sending snapshot data to a webhook URL once delivery completes.
Properties
| Name | Meaning |
|---|---|
| Snapshot ID | The unique identifier of the snapshot to be delivered. |
| Notify | URL where a notification will be sent once the delivery is finished. |
| Deliver Type | The target delivery service or method. Options: Aliyun Object Storage Service, Amazon S3, Google Cloud PubSub, Google Cloud Storage, Microsoft Azure, SFTP, Snowflake, Webhook. |
| Webhook Endpoint | The URL of the webhook to which the snapshot will be delivered (required if Deliver Type is Webhook). |
| Filename Template | Template string defining the filename for the delivered snapshot, supporting placeholders. Required for all delivery types except possibly webhook. |
| File Extension | File extension/type for the delivered file. Options: JSON, JSONL, CSV. Required for most delivery types. |
| Topic ID | Google PubSub topic ID (required if Deliver Type is Google Cloud PubSub). |
| Client Email | Client email credential for Google Cloud services (required for Google Cloud PubSub and Google Cloud Storage). |
| Private Key | Private key credential for Google Cloud services (required for Google Cloud PubSub and Google Cloud Storage). |
| Attributes | JSON object of attributes to include in the PubSub message (optional, for Google Cloud PubSub). |
| Container | Azure container name (required if Deliver Type is Microsoft Azure). |
| Bucket | Bucket name for S3, Aliyun OSS, or Google Cloud Storage (required for these delivery types). |
| AWS Access Key | AWS access key ID (required for Amazon S3). |
| AWS Secret Key | AWS secret access key (required for Amazon S3). |
| Access Key | Access key for Aliyun OSS (required for Aliyun OSS). |
| Secret Key | Secret key for Aliyun OSS (required for Aliyun OSS). |
| Account | Azure storage account name (required for Microsoft Azure). |
| Key | Azure storage key (required for Microsoft Azure). |
| SAS Token | Azure SAS token for access (required for Microsoft Azure). |
| Role ARN | Optional AWS role ARN for Amazon S3. |
| External ID | Optional external ID for AWS role assumption in Amazon S3. |
| Directory | Target directory/path inside the storage or delivery location (optional for many delivery types). |
| Region | AWS region for Amazon S3 or Aliyun OSS (optional). |
| Host | SFTP server hostname (required for SFTP). |
| Port | SFTP server port (default 22). |
| Path | Remote path on the SFTP server to store the file (optional). |
| Username | Username for SFTP authentication (required for SFTP). |
| Password | Password for SFTP authentication (required for SFTP). |
| SSH Key | SSH private key for SFTP authentication (required for SFTP). |
| Passphrase | Passphrase for the SSH key if applicable (optional for SFTP). |
| Database | Snowflake database name (required for Snowflake). |
| Schema | Snowflake schema name (required for Snowflake). |
| Stage | Snowflake stage name (required for Snowflake). |
| Role | Snowflake role (required for Snowflake). |
| Warehouse | Snowflake warehouse (required for Snowflake). |
| Snowflake Account | Snowflake account credential (required for Snowflake). |
| Snowflake User | Snowflake user credential (required for Snowflake). |
| Snowflake Password | Snowflake password credential (required for Snowflake). |
| Compress | Boolean flag indicating whether to compress the delivered file in gzip format (optional). |
Output
The node outputs the response from the delivery API call in its json output field. This typically includes status information about the delivery request, such as success confirmation or error details.
If the delivery involves binary data (e.g., snapshot files), the node handles it internally but does not expose raw binary data directly in the output. Instead, the output focuses on metadata and delivery status.
Dependencies
- Requires an active Bright Data API connection authenticated via an API key credential.
- Depending on the chosen delivery type, requires credentials and configuration for the respective external service:
- Cloud storage services (AWS S3, Aliyun OSS, Google Cloud Storage, Azure Blob Storage) require appropriate access keys, secrets, tokens, and region/account info.
- Google Cloud PubSub requires client email and private key credentials.
- SFTP requires host, port, username, and either password or SSH key credentials.
- Snowflake requires database connection credentials including account, user, password, role, warehouse, schema, and stage.
- Network connectivity to the target delivery endpoints.
- Proper permissions configured on external services to allow writing/uploading data.
Troubleshooting
- Invalid Credentials: Delivery may fail if credentials are missing, incorrect, or lack sufficient permissions. Verify all required keys, tokens, and access rights.
- Incorrect Snapshot ID: If the snapshot ID does not exist or is invalid, the delivery will fail. Confirm the snapshot ID is correct and accessible.
- Network Issues: Connectivity problems to external services (S3, SFTP, etc.) can cause timeouts or failures. Check network access and firewall rules.
- Misconfigured Delivery Parameters: Missing required parameters for the selected delivery type (e.g., missing bucket name, container, or topic ID) will cause errors. Ensure all mandatory fields are provided.
- File Naming Errors: Invalid or empty filename templates may cause delivery failure. Use valid templates with supported placeholders.
- Compression Issues: If compression is enabled but unsupported by the target service, delivery might fail.
- Error Messages: The node returns API error responses in the output. Common messages include authentication failures, permission denied, resource not found, or invalid parameter errors. Review the error details and adjust configuration accordingly.