Overview

What is Zyte: Zyte is a web data extraction platform headquartered in Ballincollig, Cork, Ireland. Launched in 2010 as Scrapinghub and rebranded in February 2021, Zyte provides the Zyte API for AI-powered extraction through a single HTTP endpoint, with ban avoidance and headless browser rendering.

How to integrate Zyte with Datagrid

Use the Zyte integration when Datagrid agents need structured records from public URLs instead of page markup. Set up the integration by generating a Zyte API key, then configuring authentication and scheduled syncs.

Generate a Zyte API key

Generate your Zyte API key from the Zyte dashboard.
In Datagrid, open Settings > integrations > Add New.
Select Zyte from the integration list.
Paste your Zyte API key into the credential field.
Define the target URLs and the extraction type for each request.
Save the connection and run a test extraction.

A Datagrid integration configuration should capture the credentials, target URL list, extraction type, and refresh pattern:

integration: Zyte
credential: Zyte API key
authentication: HTTP Basic Authentication
password: ""
targets:
  - target_url: "<public URL>"
    extraction_type: product
sync_mode: scheduled or polling-based

Authenticate Zyte requests

Zyte API uses HTTP Basic Authentication. Your API key is the username, with an empty string as the password. In curl, this means a trailing colon after the key.

Use this credential pattern when Datagrid stores the Zyte API key:

Username: <Zyte API key>
Password: <empty string>
curl credential form: "$ZYTE_API_KEY:"

Zyte issues three distinct credentials: the Zyte API key, the Scrapy Cloud key, and the Stats dashboard key. They are not interchangeable.

Configure scheduled data syncs

The Zyte integration pulls structured web data into agentic workflows. Datagrid initiates scheduled or polling-based extraction, and Zyte API returns one automatic extraction type per request, since only one AI extraction field may be enabled per call.

Supported types include product, productList, productNavigation, article, articleList, articleNavigation, forumThread, jobPosting, jobPostingNavigation, pageContent, and serp.

A product record ingested by Datagrid can be shaped around the structured fields already returned by Zyte extraction:

{
  "extractionType": "product",
  "record": {
    "name": "<product name>",
    "price": "<product price>",
    "sku": "<product SKU>",
    "availability": "<availability>"
  },
  "metadata": {
    "probability": "0 to 1"
  }
}

Datagrid agents can then filter low-confidence records and send accepted records to the next workflow.

Why use Zyte with Datagrid

Connecting Zyte to Datagrid gives project teams a repeatable way to convert public web pages into structured records that agents can execute against. Instead of writing custom scrapers and cleaning HTML by hand, teams get typed objects that flow directly into agentic workflows.

AI-structured web data on demand: Zyte converts raw HTML into typed objects with dozens of structured fields on the product schema alone, and Datagrid agents process clean records directly.
Autonomous data retrieval: Datagrid agents determine what web data they need and fetch it through the authenticated Zyte API without manual prompting.
Confidence scoring built in: Each single-item extraction returns a metadata.probability score from 0 to 1, so agents can filter low-confidence records before acting.
JavaScript-heavy site coverage: Zyte's headless browser executes full JavaScript and handles ban avoidance, so agents reach pages that block standard requests.
Custom schema extraction: Zyte API extracts custom attributes through an LLM, so Datagrid agents pull fields specific to your workflow alongside the standard object.
Warehouse-ready output: Zyte Managed Data delivers structured JSON and CSV records. Datagrid orchestration writes agent-processed records into Snowflake for scheduled analysis and enrichment.

What you can build with Zyte Datagrid integration

Use Zyte with Datagrid when mission-critical workflows depend on external web data that changes often. The combination supports pricing intelligence, monitoring, training data enrichment and search visibility tracking from a single extraction layer.

Competitor price intelligence: Procurement and project teams in the built world use Zyte to extract product and productList data from supplier and vendor sites. Each record carries name, price, SKU, and availability. Datagrid agents monitor competitor pages and run automated competitive pricing workflows.
Automated news monitoring: Zyte's article and articleList types feed daily extraction pipelines. Datagrid agents filter by keyword and industry, then trigger digests for recurring monitoring workflows.
AI training data enrichment: Zyte's pageContent extraction produces clean, slim-line content ready for LLMs with no HTML cleanup. A legal services firm used Zyte to extract new court cases for AI training.
SERP rank tracking: Zyte's SERP extraction parses search results, so Datagrid can write agent-processed records to a warehouse on a schedule for time-series rank analysis.

Resources and documentation

Get-started guide: First-request setup, authentication, and Zyte IDE basics.
Zyte API usage documentation: Extract structured data from pages or sites and enrich it with LLM prompts.
Zyte API reference documentation: Full HTTP API reference for request fields, extraction types, and authentication.
Zyte IDE: Write, debug, and deploy browser scripts inside your extraction requests.

Frequently asked questions

How do I authenticate with Zyte API in Datagrid?

Zyte API uses HTTP Basic Authentication. Your API key is the username with an empty string as the password, which means a trailing colon after the key in curl. Store the Zyte API key in Datagrid's integration credential field.

Does the Zyte integration sync data in both directions?

No. Datagrid initiates scheduled extraction requests, and Zyte API returns structured data in the response.

Can I extract multiple data types in a single request?

No. Only one automatic extraction field may be enabled per request, so you cannot mix product and article in one call. Mixed-content crawls require separate requests per type.

Can Zyte extract custom fields beyond the standard schema?

Yes. Zyte API extracts custom attributes through an LLM in both Extract and Generate modes, so Datagrid agents pull fields specific to your workflow alongside the standard object.

Similar integrations

Amazon AWS S3: Store Zyte's delivered JSON/CSV exports or pipeline outputs in S3 for downstream agentic AI workflows and archival purposes.
Google Cloud Storage: Ingest Zyte extraction deliveries into Google Cloud Storage for BigQuery loading and agentic processing.
BigQuery: Load Zyte's structured JSON/CSV outputs into BigQuery for large-scale analysis, SERP time-series, and agent-driven reporting pipelines.
Snowflake: Write Zyte-extracted records into Snowflake to centralize web data for AI model training, pricing analytics, and scheduled enrichment.
Databricks: Process Zyte web datasets in Databricks for feature engineering, LLM training data cleaning, and scalable ETL before warehouse ingestion.
PostgreSQL: Load Zyte extraction outputs into PostgreSQL for application-level querying, enrichment, and low-latency operational lookups.

Overview

How to integrate Zyte with Datagrid

Generate a Zyte API key

Generate your Zyte API key from the Zyte dashboard.
In Datagrid, open Settings > integrations > Add New.
Select Zyte from the integration list.
Paste your Zyte API key into the credential field.
Define the target URLs and the extraction type for each request.
Save the connection and run a test extraction.

A Datagrid integration configuration should capture the credentials, target URL list, extraction type, and refresh pattern:

integration: Zyte
credential: Zyte API key
authentication: HTTP Basic Authentication
password: ""
targets:
  - target_url: "<public URL>"
    extraction_type: product
sync_mode: scheduled or polling-based

Authenticate Zyte requests

Zyte API uses HTTP Basic Authentication. Your API key is the username, with an empty string as the password. In curl, this means a trailing colon after the key.

Use this credential pattern when Datagrid stores the Zyte API key:

Username: <Zyte API key>
Password: <empty string>
curl credential form: "$ZYTE_API_KEY:"

Zyte issues three distinct credentials: the Zyte API key, the Scrapy Cloud key, and the Stats dashboard key. They are not interchangeable.

Configure scheduled data syncs

Supported types include product, productList, productNavigation, article, articleList, articleNavigation, forumThread, jobPosting, jobPostingNavigation, pageContent, and serp.

A product record ingested by Datagrid can be shaped around the structured fields already returned by Zyte extraction:

{
  "extractionType": "product",
  "record": {
    "name": "<product name>",
    "price": "<product price>",
    "sku": "<product SKU>",
    "availability": "<availability>"
  },
  "metadata": {
    "probability": "0 to 1"
  }
}

Datagrid agents can then filter low-confidence records and send accepted records to the next workflow.

Why use Zyte with Datagrid

AI-structured web data on demand: Zyte converts raw HTML into typed objects with dozens of structured fields on the product schema alone, and Datagrid agents process clean records directly.
Autonomous data retrieval: Datagrid agents determine what web data they need and fetch it through the authenticated Zyte API without manual prompting.
Confidence scoring built in: Each single-item extraction returns a metadata.probability score from 0 to 1, so agents can filter low-confidence records before acting.
JavaScript-heavy site coverage: Zyte's headless browser executes full JavaScript and handles ban avoidance, so agents reach pages that block standard requests.
Custom schema extraction: Zyte API extracts custom attributes through an LLM, so Datagrid agents pull fields specific to your workflow alongside the standard object.
Warehouse-ready output: Zyte Managed Data delivers structured JSON and CSV records. Datagrid orchestration writes agent-processed records into Snowflake for scheduled analysis and enrichment.

What you can build with Zyte Datagrid integration

Competitor price intelligence: Procurement and project teams in the built world use Zyte to extract product and productList data from supplier and vendor sites. Each record carries name, price, SKU, and availability. Datagrid agents monitor competitor pages and run automated competitive pricing workflows.
Automated news monitoring: Zyte's article and articleList types feed daily extraction pipelines. Datagrid agents filter by keyword and industry, then trigger digests for recurring monitoring workflows.
AI training data enrichment: Zyte's pageContent extraction produces clean, slim-line content ready for LLMs with no HTML cleanup. A legal services firm used Zyte to extract new court cases for AI training.
SERP rank tracking: Zyte's SERP extraction parses search results, so Datagrid can write agent-processed records to a warehouse on a schedule for time-series rank analysis.

Resources and documentation

Get-started guide: First-request setup, authentication, and Zyte IDE basics.
Zyte API usage documentation: Extract structured data from pages or sites and enrich it with LLM prompts.
Zyte API reference documentation: Full HTTP API reference for request fields, extraction types, and authentication.
Zyte IDE: Write, debug, and deploy browser scripts inside your extraction requests.

Frequently asked questions

How do I authenticate with Zyte API in Datagrid?

Does the Zyte integration sync data in both directions?

No. Datagrid initiates scheduled extraction requests, and Zyte API returns structured data in the response.

Can I extract multiple data types in a single request?

No. Only one automatic extraction field may be enabled per request, so you cannot mix product and article in one call. Mixed-content crawls require separate requests per type.

Can Zyte extract custom fields beyond the standard schema?

Yes. Zyte API extracts custom attributes through an LLM in both Extract and Generate modes, so Datagrid agents pull fields specific to your workflow alongside the standard object.

Similar integrations

Amazon AWS S3: Store Zyte's delivered JSON/CSV exports or pipeline outputs in S3 for downstream agentic AI workflows and archival purposes.
Google Cloud Storage: Ingest Zyte extraction deliveries into Google Cloud Storage for BigQuery loading and agentic processing.
BigQuery: Load Zyte's structured JSON/CSV outputs into BigQuery for large-scale analysis, SERP time-series, and agent-driven reporting pipelines.
Snowflake: Write Zyte-extracted records into Snowflake to centralize web data for AI model training, pricing analytics, and scheduled enrichment.
Databricks: Process Zyte web datasets in Databricks for feature engineering, LLM training data cleaning, and scalable ETL before warehouse ingestion.
PostgreSQL: Load Zyte extraction outputs into PostgreSQL for application-level querying, enrichment, and low-latency operational lookups.

Zyte + Datagrid integration

Overview

How to integrate Zyte with Datagrid

Generate a Zyte API key

Authenticate Zyte requests

Configure scheduled data syncs

Why use Zyte with Datagrid

What you can build with Zyte Datagrid integration

Resources and documentation

Frequently asked questions

How do I authenticate with Zyte API in Datagrid?

Does the Zyte integration sync data in both directions?

Can I extract multiple data types in a single request?

Can Zyte extract custom fields beyond the standard schema?

Similar integrations

You've got more important things to do. Let Datagrid handle the rest.

Zyte + Datagrid integration

Overview

How to integrate Zyte with Datagrid

Generate a Zyte API key

Authenticate Zyte requests

Configure scheduled data syncs

Why use Zyte with Datagrid

What you can build with Zyte Datagrid integration

Resources and documentation

Frequently asked questions

How do I authenticate with Zyte API in Datagrid?

Does the Zyte integration sync data in both directions?

Can I extract multiple data types in a single request?

Can Zyte extract custom fields beyond the standard schema?

Similar integrations

You've got more important things to do. Let Datagrid handle the rest.