Microsoft Azure Storage

Updated by Daniel Odrinski

Microsoft Azure Storage

This feature is available as an add-on for the following plans:

  • Business
  • Enterprise

Data Warehouse Microsoft Azure Storage is a secure way to get a dump of the data platform sent to an Azure Storage Container in your own Azure AD domain using a built-in (Quick setup) or a custom Service Principal Role (Advanced setup).

The Data Platform gives you access to a raw dump of the database tables that Dreamdata uses to build all its insights around. Having access to the raw data enables you to both, load this into any existing database platform that can load data from Azure Storage Containers, as well as to create insights tailored to specific needs that your organization might have.

Note: The Microsoft Azure Storage Data Warehouse only supports schema v3. You can find the documentation here.

Pre-requisites

In order to enable this integration, the following requirements must be met:

  1. Have a Microsoft Azure Account with an active subscription.
  2. Have Administrator permissions or equivalent.
    1. The person performing these steps must have the necessary permissions in your own Azure AD domain to:
      1. Provide administrator consent to our Dreamdata Data Warehouse application.
      2. Create an Azure Storage Container (and optionally to create a new Azure Storage Account).
      3. Assign an IAM role to an Azure Storage Container.
    2. Optionally, for the advanced setup:
      1. Create custom roles in the organisation's domain Azure AD.
      2. Edit trust and permission policies on roles

Guide

The guide below will walk you through setting up an Azure Storage Container and describes how to give our Data Warehouse app access permissions to it, so that we can send data to your organisation.

This guide will cover the following 5 steps:

  1. Enable the Dreamdata Data Warehouse Microsoft Azure Storage integration in the Dreamdata App.
  2. [Optional] Create a new Azure Storage Account.
  3. Create or select an Azure Storage Container as the destination of the data.
  4. [Optional] Create a custom role (advanced setup) for minimal access to the Storage Container.
  5. Assign role, granting permission to the Dreamdata Data Warehouse Azure App to access your Storage Container.
  6. Configure and Validate integration.

Step 1. Enable the Dreamdata Data Warehouse Microsoft Azure Storage integration

Navigate to Data Platform > Data Access > Microsoft Azure Storage in the Dreamdata App, then follow these sub-steps:

  1. Click on the 'Enable' button:
    After clicking, you will be automatically re-directed to Microsoft's Entra ID platform.
  2. Select your organisational administrator account (or an account with sufficient permissions - see 'Pre-requisites' section above)
  3. Review requested permissions and provide administrator consent to the Dreamdata Data Warehouse Azure app:
  4. After selecting Accept, the Dreamdata Data Warehouse Azure App has been added to your organisation and you should automatically be re-directed back to the Dreamdata App (dreamdata.io). You should now see a form similar to the following:
  5. We will come back to this form in a moment in section 'Step 5. Configuration and Validation'.

Step 2. [Optional] Create a new Storage Account

You need a Storage Account in which to create a new Storage Container. If you already have a Storage Account you want to use for this integration, skip this step and go to Step 3.

To create a new Storage Account inside your Azure Subscription, follow the official guide. Keep in mind that Storage Account names are global, so your account name needs to be unique or it may fail to create.

Step 3. Create a Storage Container

Navigate to your chosen Storage Account inside your Azure Subscription and copy its name. Data will be copied to a Storage Container inside this Account which we will create below. If you prefer to create a new Storage Account, see step 2. Throughout the rest of this guide, we will refer to this Storage Account using the <STORAGE_ACCOUNT_NAME> placeholder.

Next, create a new Azure Storage Container inside the Storage Account, by following the official guide. Give it a descriptive name and copy that name. Data will be copied to this Storage Container. Throughout the rest of this guide, we will refer to the Storage Container created in this step using the <STORAGE_CONTAINER_NAME> placeholder.

Dreamdata does not delete previous data dumps, so we recommend that a storage lifecycle management policy is put in place to limit the data size, so that the container does not grow indefinitely. A value of 7 days would be a sensible value, but it can vary depending on your use-case.

Step 4. [Optional, Advanced setup] Create custom role for Storage container access

If you would prefer to give the Dreamdata Data Warehouse Azure App the absolute minimal privileges required (as per the industry best-practice 'least privilege principle'), you need to create a custom role by following the official guide, providing only the following permissions:

  • Add Blob: Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action
  • Read Blob: Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read
  • Write Blob: Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write
  • Delete Blob: Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete
  • Move Blob: Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action

When creating a custom role, you may choose to use the following JSON file as a starting point:

{
"properties": {
"roleName": "Storage Blob Data Operator",
"description": "Allows for read, write and delete access to Azure Storage blob containers and data.",
"assignableScopes": [
"/subscriptions/<SUBSCRIPTION_ID>"
],
"permissions": [
{
"actions": [],
"notActions": [],
"dataActions": [
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/delete",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/read",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/write",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/move/action",
"Microsoft.Storage/storageAccounts/blobServices/containers/blobs/add/action"
],
"notDataActions": []
}
]
}
}

Just save it to your computer as a .json file, replace the <SUBSCRIPTION_ID> reference with your organisation's subscription ID and select it when prompted for a JSON file in the custom role creation form:

Step 5. Give permissions to Dreamdata Data Warehouse App to access your Storage Container

In order for Dreamdata to be able to send data to your organisation Storage Account, you need to give permissions to the Dreamdata Data Warehouse App (Service Principal) permissions to access your Storage Account. Follow these steps to do just that:

  1. Navigate to your Storage Account.
  2. Navigate to your Storage Container inside your Storage Account:
  3. Select the Access Control (IAM) property pane - example: dreamdatadatawarehouse:
  4. Click the 'Add' menu and select 'Add role assignment':
  5. From the list of roles, select either the custom role created in step 3 Dreamdata Datawarehouse Storage Blob Operator or the built-in Storage Blob Data Contributor.
    Here our custom role is called "Storage Blob Data Operator".
  6. Click on 'Select Members' and in the Search box that pops up, type in 'Dreamdata Data Warehouse'.
  7. Click on the application and press on the 'Select' button:
  8. Click on the 'Review + assign' button twice:
  9. You should now see the role assigned to the Dreamdata Data Warehouse App (Service Principal):

Step 6. Configure and Validate

  1. Fill out the form using the values copied previously where:
    1. Storage Account Name: <STORAGE_ACCOUNT_NAME>
    2. Storage Account Container Name: <STORAGE_CONTAINER_NAME>
    3. Path: This can be any path inside your Storage container. All data copied to your Storage Container will appear under this path. Leave blank in order to write to the root of the Storage Container.
  2. Press 'Save' and wait for your configuration to be validated. The validation will perform a write, list and delete operation on the configured container/path. This is to verify that the roles have been correctly assigned and the fields have been correctly filled in. This should usually complete within 10-15 seconds, though it may take longer.
  3. If your configuration is valid, you should see a message like the following:
  4. If the configuration is invalid, please review after reading the provided error message. Ensure that all details match exactly as stated in your Storage account.
    If you are presented with an 'insufficient permissions' error, please review your setup from sections 'Step 3. Create custom role for Storage container access' (if you are using a custom role) and 'Step 4. Give permissions to Dreamdata Data Warehouse App to access your Storage Container'.
Note: It make take up to a few minutes for the role assignment performed in section 'Step 4. Give permissions to Dreamdata Data Warehouse App to access your Storage Container' to take effect. Therefore, you may come across this error in the meantime. Please leave a few minutes between making changes in Storage Accounts IAM and validating your configuration in the Dreamdata App.

How the Data looks

  • The different tables and their schemas are documented here.
  • Each folder contains a complete dump of the table in the ndjson (newline-delimited json) format.

Data will appear in the bucket using the structure shown below. If a Path is optionally specified in the Dreamdata App, all files will be nested under it.

The following examples assume that no Path is configured, with the files being placed at the root level of the bucket:

receipt.json
schema/bigquery/attribution.json
schema/bigquery/companies.json
schema/bigquery/contacts.json
schema/bigquery/events.json
schema/bigquery/spend.json
schema/bigquery/stages.json
2025-09-15T08/attribution/attribution_*.ndjson.gz
2025-09-15T08/companies/companies_*.ndjson.gz
2025-09-15T08/contacts/contacts_*.ndjson.gz
2025-09-15T08/events/events_*.ndjson.gz
2025-09-15T08/spend/spend_*.ndjson.gz
2025-09-15T08/stages/stages_*.ndjson.gz

Inside each folder are one or more ndjson gzip files. Here, the files inside the companies folder are shown:

2025-09-15T08/companies/companies_000000000000.ndjson.gz
2025-09-15T08/companies/companies_000000000001.ndjson.gz
...

Note: Please note that one or more of the *.ndjson.gz archive files may contain an empty *.ndjson file. This is purely due to the way Google BigQuery exports tables to files. Customers are advised to ignore such files when reading and processing all files for a given table.

A receipt.json file is created/updated upon every successful data dump, containing a description of all dumped data, including the following fields:

  1. timestamp: RFC3339 timestamp of when the data upload to your Azure Storage Container was started.
  2. exported_at: RFC3339 timestamp of when the data was exported from BigQuery before starting upload to your Azure Storage Container.
  3. tables: is an object with an entry for each exported table, each of which is an object with the following fields:
    1. folder: the path to the folder with the latest dump for that type
    2. total_file_count: number of files in the dump, for this table
    3. schemas: this contains a link to the bigquery schema dump. This is a good starting point when wanting to translate the data into the supported data structures in your own data warehouse.
{
"timestamp": "2025-09-15T09:15:04.569782868Z",
"exported_at": "2025-09-15T09:13:59.318684081Z",
"tables": {
"attribution": {
"folder": "2025-09-15T09/attribution/",
"total_file_count": 1,
"schemas": {
"bigquery": "schema/bigquery/attribution.json"
}
}
...
}
}

Schedule

A full dump of each data platform table is created after each successful Data Modelling run.


How did we do?