Overview
Azure Data Lake Storage Gen2 is Microsoft's enterprise-grade cloud data lake, built on top of Azure Blob Storage with a hierarchical namespace enabled. It stores data in any format, including CSV, JSON, Parquet, Avro, and binary, and handles petabyte-scale workloads. ADLS Gen2 provides true directory structures with atomic rename and delete operations, POSIX-compliant access control lists, and multi-protocol access via Blob REST API, DFS REST API, HDFS, NFS 3.0, and SFTP.
What is Azure Data Lake Storage: Azure Data Lake Storage Gen2 is Microsoft's cloud data lake for structured, semi-structured, and unstructured data. It combines large-scale storage with hierarchical directories and data lake access patterns.
Datagrid connects to Azure Data Lake Storage Gen2 as both a source and a destination. Datagrid's AI agents read files from the lake, transform data into business-ready formats, and write results back based on schedules or source changes. This article covers what the integration does inside Datagrid, including setup, authentication, sync behavior, and workflow examples. It covers Datagrid's connection, access, and sync behavior for ADLS Gen2 and does not cover Azure storage account provisioning beyond the linked Azure documentation.
The integration covers file, directory, and container access inside ADLS Gen2. Datagrid's agents ingest raw files from Bronze directories, apply transformations, and route curated output to downstream systems or write enriched data back into Gold directories for analytics and reporting. Teams can also blend ADLS Gen2 data with 50+ other sources in a single workflow.
How to integrate Azure Data Lake Storage with Datagrid
This setup is for operators who need Datagrid to read from and write to an Azure data lake without manual handoffs. The steps below walk through adding the integration, authenticating access, configuring sync behavior, and reviewing the resulting configuration.
Add the integration
Log in to Datagrid and go to Settings > Integrations > Add New
Select Azure Data Lake Storage from the integration list
Enter your Azure Storage account name and authenticate
Select the container or filesystem you want Datagrid to access
Configure read, write, or bidirectional access based on your workflow requirements
Test the connection and save
Authenticate access
The integration authenticates using your Azure credentials. Microsoft Entra ID (OAuth 2.0) is the recommended authorization method for ADLS Gen2. The connecting identity requires an appropriate Azure RBAC role: Storage Blob Data Reader for read-only access or Storage Blob Data Contributor for read-write access, assigned at the storage account or container scope.
In Datagrid, your account must have the Azure Data Lake Storage Administrator permission to configure the integration.
Configure data sync
The integration supports the following sync settings and data objects.
Sync direction — Bidirectional (read from and write to ADLS Gen2)
Supported formats — CSV, JSON, Parquet, Avro, ORC, XML, Excel, binary
Data objects — Files, directories, containers/filesystems
Trigger types — Scheduled, source change
Access protocols — DFS REST API (
dfs.core.windows.net), Blob REST API (blob.core.windows.net)
Review a sample configuration
The following sample shows how the integration settings can be structured based on the setup fields above.
{
"integration": "Azure Data Lake Storage",
"storage_account": "your-storage-account",
"container_or_filesystem": "project-data",
"access": "bidirectional",
"triggers": ["scheduled", "source change"],
"formats": ["CSV", "JSON", "Parquet", "Avro", "ORC", "XML", "Excel", "binary"],
"protocols": ["dfs.core.windows.net", "blob.core.windows.net"]
}
Use these settings to define how Datagrid reads from and writes to your lake. For detailed setup requirements and permissions, refer to the Datagrid documentation linked above.
Why use Azure Data Lake Storage with Datagrid
This integration fits operators running mission-critical programs who need answers and action, not admin. Datagrid executes file-based workflows directly inside the data lake so operators can keep source data, transformed output, and downstream routing in one operating model.
Bidirectional data lake access: Datagrid's AI agents read raw files from ADLS Gen2 and write transformed results back to the same repository without intermediate staging.
Format-agnostic processing: Agents handle CSV, JSON, Parquet, Avro, XML, and binary files stored in the lake, extracting and transforming data across different structures.
Event-driven automation: Datagrid can trigger workflows on source changes so pipelines run when new files land in ADLS Gen2.
Cross-source data blending: Combine ADLS Gen2 data with 50+ other sources in a single automated workflow.
Hierarchical namespace operations: Datagrid works with ADLS Gen2's true directory structure, including atomic rename and delete operations.
Autonomous pipeline execution: Agents handle extraction, cleaning, enrichment, and routing across Bronze, Silver, and Gold layers with minimal manual intervention between steps.
These capabilities matter most when operators need consistent execution across large file volumes, multiple source systems, and changing downstream requirements.
What you can build with Azure Data Lake Storage and Datagrid
Datagrid executes a wide range of file-based workflows on top of Azure Data Lake Storage. The examples below show how operators and project teams can standardize recurring work across raw, curated, and downstream data flows.
Automated ETL across a medallion architecture: Datagrid's AI agents read raw project files from an ADLS Gen2 Bronze layer, apply validation and transformation logic, and write curated output into Silver and Gold directories. A construction firm could ingest daily schedule exports, spec sheets, and RFI logs into ADLS Gen2, then have agents cross-reference and produce consolidated project status datasets for dashboards without manual data wrangling.
Agentic document extraction from lake storage: Store unstructured project files such as submittals, drawings, contracts, and inspection reports in ADLS Gen2 and let Datagrid agents extract structured data from them. An insurance operations team could drop claim files into a designated container and have agents parse key fields, validate them against policy data from a connected system, and write structured claim records back to the lake for downstream analytics.
Cross-platform data consolidation for project teams: Pull data from ADLS Gen2 alongside other databases, document systems, and field management tools into a single automated workflow. A manufacturing team could consolidate BOMs stored as Parquet files in the data lake with ERP records and supplier specs, producing a unified dataset that agents keep current as source files change.
Triggered data routing and distribution: Configure Datagrid to detect new file arrivals in specific ADLS Gen2 directories and route processed data to the right destination. When a project team uploads updated cost data to a designated container, agents can transform the data, push summaries to team channels, write aggregated records to connected analytics systems, and archive processed files in a separate ADLS Gen2 directory from the same trigger event.
These examples show the practical range of the integration: ingest, transform, validate, route, and write back without forcing teams to move core lake data into a separate workflow layer.
Resources and documentation
Use these resources when you need deeper product, API, or access-control detail.
Azure Data Lake Storage Gen2 introduction
ADLS Gen2 REST API reference: filesystem and path operations, authentication methods, current API version
Create a storage account for ADLS Gen2: quickstart guide for provisioning an HNS-enabled account
ADLS Gen2 access control model: ACL management across Azure Storage Explorer, Portal, .NET, Java, Python, and CLI
These references cover the Azure-side concepts that most often affect setup and permissions.
Frequently asked questions
What authentication method does the Azure Data Lake Storage integration use?
The integration uses Azure credentials, and Microsoft Entra ID (OAuth 2.0) is Microsoft's recommended authorization approach for ADLS Gen2. The connecting identity needs an Azure RBAC role such as Storage Blob Data Reader for read-only access or Storage Blob Data Contributor for read-write access, assigned at the storage account or container scope. In Datagrid, your account also requires the Azure Data Lake Storage Administrator permission.
Does the integration support both reading from and writing to ADLS Gen2?
Yes. Datagrid supports Azure Data Lake Storage as both a source and a destination. Datagrid's AI agents can ingest data from your lake and write processed or enriched data back.
What file formats can Datagrid process from Azure Data Lake Storage?
ADLS Gen2 stores data in any format, and Datagrid can process common types used in analytics workflows, including CSV, JSON, Parquet, Avro, ORC, XML, Excel, and binary files.
Does my Azure Storage account need the hierarchical namespace enabled?
ADLS Gen2 capabilities, including true directory structures, atomic directory operations, and POSIX ACLs, are activated by enabling the hierarchical namespace on your Azure Storage account. Without HNS enabled, the account operates as standard Azure Blob Storage and does not provide data lake functionality. Confirm HNS is enabled on your account before configuring the Datagrid integration.
Can Datagrid trigger workflows when new data lands in ADLS Gen2?
Yes. The Datagrid integration supports trigger-based execution, including source change detection.
Similar integrations
The following integrations are closely related to Azure Data Lake Storage workflows in Datagrid.
Azure Blob Storage: General-purpose Azure object storage and the foundational layer ADLS Gen2 is built on, without the hierarchical namespace.
Azure SQL Database: Managed relational database on Azure, commonly used alongside ADLS Gen2 for structured query workloads.
Azure PostgreSQL Database: Managed PostgreSQL on Azure, used for transactional and analytical workloads that complement lake storage.
These related integrations cover adjacent storage and database patterns commonly used with ADLS Gen2.