What is Dreamdata?

Intro to Dreamdata

The Onboarding Process

Setting Up Dreamdata

Onboarding for paying customers

Understanding: Single sign-on and SAML

How to invite your colleagues to Dreamdata

How to validate your data

How to set up stage models

Shared Reports

Real-Time View

Events report

Pages report

Segmentation

Ad Spend

Signal Impact Report

Setup Content Reporting

Content Performance - Dashboard Options

What KPI should you use to measure the effect of B2B content?

Measuring influenced pipeline for B2B content - the true conversion metric

What content generates pipeline?

Which channel performs best for different content?

Overview

Return on Ads Spend (ROAS)

Google Search Ads

Google Display Ads

LinkedIn Ads

Microsoft Ads

Facebook Ads

YouTube Ads

Capterra Ads

Google Search

Organic

Acquisition

ROI

Performance vs. Revenue attribution: A guide on when to use what

AI-Generated Report Summary

Dreamdata Reveal

Engagement Score

Company Journey report

Deals

Dreamdata Report for LinkedIn Engagement

Search for companies or contacts

Funnel Stages Report

Link to the Customer Journey from your CRM

Content Analytics - Dashboard Options

Which content influenced the MQLs created in a time period?

Revenue Reporting

Revenue Segmentation

Revenue Attribution

Attribution Models- dashboard explanation

Customer Acquisition Cost (CAC) report

Time to Value Report

Evaluate how G2 Influences your Business

Overview

Web Traffic

Ad Performance

Ad Budget

Journey Metrics

Activating Signals with Audiences

AI Signal Recommendation Agent

Signals

Microsoft Customer List

Google Ads Customer Match

Meta Audiences

LinkedIn Matched Audiences

LinkedIn Conversions (CAPI)

Microsoft Enhanced Conversions

Meta Conversions

Google Ads Enhanced Conversion For Leads

Google Ads Enhanced Conversion: Salesforce vs Dreamdata

Google Ads Enhanced Conversion: Hubspot vs Dreamdata

Syncs - Automation

Syncs - Data Privacy

Understanding Data Privacy for B2B Advertising: Consent for Conversions and Audiences

Webhook syncs

The Dreamdata Chrome Extension

Slack Notifications

Microsoft Teams Notifications

Audience Builder

Audiences: Hubspot vs Dreamdata

Setting up Meta Ads

Paid sources: Overview

Setting up NextRoll

Setting up X (Twitter) Ads

Setting up Microsoft Ads

Setting up LinkedIn Ads

Setting up Google Ads

Setting up LinkedIn Lead Ads access & permissions

Setting up G2

Setting up Google Search

Setting Up LinkedIn Enhanced Engagement

Setting up Capterra

Setting up Microsoft Dynamics

Setting up Salesforce

Parent and Child Account Relationships

Setting up Pipedrive

Setting up HubSpot

Setting up Salesforce Marketing Cloud Account Engagement (Pardot)

Setting up Salesforce Marketing Cloud - Early Access

Setting up Marketo

Setting up Oracle Eloqua - Early Access

Import Customer Acquisition Cost data using Google Sheet

Import Events data using Google Sheet

Import ROI Cost Data using Google Sheet

Upload custom Stage Objects

Upload custom ROI Cost or CAC data

Upload custom Events and Web Tracking

Upload Custom CRM data

Custom Data Upload

Zapier Use Cases

Setting up Zapier integration & Zaps for Lead Ads

Setting up SafeBase Integration

Setting up Outreach

Setting up Intercom

Types of Attribution Models

Data-Driven Attribution

Custom Attribution Models

Attribution Exclusions

LinkedIn Impression Attribution

Creating Attribution Models

Setup Guide: All Salesforce Opportunities entering specific Stage

Setup Guide: All Microsoft Dynamics Opportunities in a specific Stage

Setup Guide: All Pipedrive Deals entering specific Stage

Stage Model Preview

Setup Guide: Creation of Opportunities/Deals

Setup Guide: Tracked sign-up events

How Dreamdata Handles Currency Exchange in Stage Models

Setup Guide: All Hubspot Deals entering specific Stage

Stage Model documentation

Data Hub

Understanding: How to map UTMs in Dreamdata

Event Builder: Create additional events in Dreamdata

Importing Historical Web Tracking Data into Dreamdata with the Event Builder

Event Builder: Best Practices

Data Modelling Schedule

Google BigQuery V2

Snowflake Schema V3

AWS S3 V2

Data Warehouse Schema

Connect your Dreamdata data to Snowflake

Setting up Data Export to BigQuery of CRM Properties

Build your own Revenue Attribution report in BigQuery

Streamline Your Revenue Analysis: Visualize all your revenue data in one place by using BigQuery Export

Google Bigquery Export - Why can't I see or query the data?

Free Datasets

Snowflake

Google BigQuery Legacy

AWS S3 Legacy

Automatically create Accounts not in your CRM

How to share Signals with your Sales team

What is Reverse ETL?

Guides for Looker Studio Reporting

Getting Started with Looker Studio Templates

Google Connected Sheets

Connect Dreamdata to Tableau

Overview

Company Data Enrichment

Working with multiple currencies

Dreamdata without connecting a CRM

Importing Historical Web Tracking Data into Dreamdata

Menu: Settings

Allowed Domains

Learn more about the 'Ad Account' filter

Learn more about the 'Branded Search' filter

Setting up B2B Benchmarks

CRM-Based Channel and Source in the Absence of Tracking Activity

CRM filters

Understanding: Unspecified

Understanding: Conversions

Understanding: Unknown

Understanding: Monthly Tracked User (MTU)

Understanding: Source, channel and event

Understanding: session

Understanding: Referrer

Company Logo

Understanding changes in historic reporting of attribution

How Dreamdata Maps Contacts to Companies

Understanding the difference: Funnel Stages vs Time to Value reports

What is a company?

Understanding: Anonymous

Data retention and deletion

How is anonymous traffic linked to companies?

Why am I seeing gaps in Segmentation report data?

Can I connect multiple CRM's?

Can I update my company details?

Can I exclude content or websites from being tracked?

Understanding: Influenced vs Attributed Leads and Value

Understanding the Difference: Conversions vs. Stages

Explainer for Stage Averages on the Revenue Reporting report

What does Visitors, Contacts and Companies mean?

How do we connect stage models

Roles and Permissions

Understanding: First party vs. third party cookies

Benchmarks FAQs

What is the reporting Time Zone?

Why are my dashboards empty?

Why am I seeing more sessions than page views?

Welcome Partner!

Ideal Customer Profile

Our Partner Tiers

Partner Advantages

Referral Guide and UTM tracking

Our Partner Material

Agency Partners - Contact Us

Understanding: UTM mapping rules

E-Commerce Order and Subscription Tracking in Dreamdata

Version 2 documentation

Setting up tracking with Segment

Sending partial data to Dreamdata

Tracking Bing Ads

Tracking Google Ads

Tracking Meta Ads

Tracking LinkedIn Ads

All Categories > > Connect your Dreamdata data to Amazon Redshift using AWS Glue

Connect your Dreamdata data to Amazon Redshift using AWS Glue

Updated 3 years ago by Steen Voersaa

What

# Integrate Dreamdata with AWS Redshift

This report shows how to integrate our Dreamdata product with your AWS Redshift cluster. The goal is to be able to query Dreamdata's data from your AWS Redshift cluster.

- [Integration with AWS Glue](#Integration-with-AWS-Glue)

## Pre-requisites

AWS:

- AWS account ID

- AWS redshift cluster

GCP:

- GCP service account in a project with Cloud Billing enbaled

Dreamdata:

- Dreamdata [GCS](https://cloud.google.com/storage) enabled ([How to](https://docs.dreamdata.io/article/xyxuyo080a-google-cloud-storage))

- Give your GCP service account permissions to access dreamdata's bucket

Reminder, don't forget to sign in to AWS and GCP form your terminal

```

aws configure

gcloud auth login --update-adc

```

If you have all the pre-requisites done, you should be able to fill the following variables (which will be used throughout the redshift integration)

```bash

aws_account_id=

redshift_cluster_id=

service_account=

gcs_name=

```

### Integration with AWS Glue

This solution relies on [AWS Glue](https://docs.aws.amazon.com/glue/index.html). AWS Glue is a service that can act as a middle layer between an [AWS s3](https://docs.aws.amazon.com/s3/index.html) bucket and your AWS Redshift cluster.

#### Steps

- [Pre-requisites](#pre-requisites)

- [Transfer bucket](#transfer-to-s3-bucket)

- [Configure AWS Glue](#configure-aws-glue)

- [Run AWS Glue crawler](#run-aws-glue-crawler)

- [Configure AWS Redshift](#configure-aws-redshift)

- [Query from AWS Redshift](#query-from-aws-redshift)

#### Transfer to s3 bucket

Let's create an AWS s3 bucket that will store all the data that is currently located in Dreamdata's premises.

Create your s3 bucket:

```bash

aws s3api create-bucket --bucket ${s3_name}

```

Transfer all the data from dreamdata GCS to your newly created s3 bucket.

```

gsutil -i ${service_account} -m rsync -rd "gs://${gcs_name}" "s3://${s3_name}"

```

Notice that in order for the transfer to work, the `service_account` in question must:

- Have access to dreamdata's bucket ([more info here](https://docs.dreamdata.io/article/xyxuyo080a-google-cloud-storage))

- Be attached to a GCP Project with Cloud Billing enabled

#### Configure AWS Glue

We will use [AWS Glue](https://docs.aws.amazon.com/glue/index.html) as a layer in between your `s3 bucket` hosting the data and your `redshift cluster`.

Firstly, create a role that can be assumed by AWS Glue:

glue-role.json

```json

{

"Version": "2012-10-17",

"Statement": [{

"Effect": "Allow",

"Principal": {

"Service": "glue.amazonaws.com"

"Action": "sts:AssumeRole"

}]

}

```

```bash

aws iam create-role \

--role-name dd-glue \

--assume-role-policy-document file://glue-role.json

```

Then, attach the policies

```

aws iam attach-role-policy \

--role-name dd-glue \

--policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole

aws iam attach-role-policy \

--role-name dd-glue \

--policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

```

Having the role populated with the right permissions, create an `AWS Glue database`

```bash

aws glue create-database \

--database-input "{\"Name\":\"dd-glue-database\"}"

```

Finally, the `AWS Glue crawler` which will populate the database with the data from the s3 bucket.

*Note that the crawler needs to know which tables you want to extract from the data that is currently located in your s3 bucket. In this example, find a crawler that extracts the tables `contacts`, `companies`, and `events`.

```bash

aws glue create-crawler \

--name dd-crawler \

--role dd-glue \

--database-name dd-glue-database \

--targets "{\"S3Targets\": [{\"Path\": \"s3://${s3_name}/contacts\"}, {\"Path\": \"s3://${s3_name}/companies\"}, {\"Path\": \"s3://${s3_name}/events\"}]}"

```

#### Run AWS Glue crawler

At this point, AWS Glue is ready to deposit the data that is hosted in your s3 bucket in its database. This same database is then going to be queried by AWS Redshift.

To populate the newly created AWS Glue database with fresh data, do:

```bash

aws glue start-crawler \

--name dd-crawler

```

Remember that you'll need to trigger the command just above every time you want to have fresh data in the AWS Glue database. Otherwise, see [trigger crawlers with crons](https://docs.aws.amazon.com/glue/latest/dg/monitor-data-warehouse-schedule.html) documentation.

#### Configure AWS Redshift

In order to do queries from your AWS Redshift cluster to the data now located in AWS Glue database, we will use [AWS Redshift Spectrum](https://docs.aws.amazon.com/redshift/latest/dg/c-getting-started-using-spectrum.html). Note that AWS Redshift Spectrum does come out of the box with AWS Redshift.

Let's add the right permissions.

Create a role that will be assumed by your AWS Redshift cluster:

redshift-role.json

```json

{

"Version": "2012-10-17",

"Statement": [{

"Effect": "Allow",

"Principal": {

"Service": "redshift.amazonaws.com"

"Action": "sts:AssumeRole"

}]

}

```

```bash

aws iam create-role \

--role-name dd-redshift \

--assume-role-policy-document file://redshift-role.json

```

Attach [minimum needed policies](https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-iam-policies.html#spectrum-iam-policies-s3),

spectrum-policy.json

```json

{

"Version": "2012-10-17",

"Statement": [

{

"Effect": "Allow",

"Action": [

"s3:GetBucketLocation",

"s3:GetObject",

"s3:ListMultipartUploadParts",

"s3:ListBucket",

"s3:ListBucketMultipartUploads"

"Resource": [

"arn:aws:s3:::${s3_name}"

]

{

"Effect": "Allow",

"Action": [

"glue:CreateDatabase",

"glue:DeleteDatabase",

"glue:GetDatabase",

"glue:GetDatabases",

"glue:UpdateDatabase",

"glue:CreateTable",

"glue:DeleteTable",

"glue:BatchDeleteTable",

"glue:UpdateTable",

"glue:GetTable",

"glue:GetTables",

"glue:BatchCreatePartition",

"glue:CreatePartition",

"glue:DeletePartition",

"glue:BatchDeletePartition",

"glue:UpdatePartition",

"glue:GetPartition",

"glue:GetPartitions",

"glue:BatchGetPartition"

"Resource": [

"*"

]

}

]

}

```

```bash

aws iam create-policy \

--policy-name dd-spectrum-redshift \

--policy-document file://spectrum-policy.json

aws iam attach-role-policy \

--role-name dd-redshift \

--policy-arn "arn:aws:iam::${aws_account_id}:policy/dd-spectrum-redshift"

```

And, finally, add the new role to the list of roles that can be used within your cluster

```bash

aws redshift modify-cluster-iam-roles \

--cluster-identifier ${redshift_cluster_id} \

--add-iam-roles "arn:aws:iam::${aws_account_id}:role/dd-redshift"

```

#### Query from AWS Redshift

We are in a good place now. The only thing left is to atually query the data form your AWS Redshift Cluster. To do so, open your preferred AWS Redshift query editor.

Create an [external schema for Amazon Redshift Spectrum](https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-schemas.html),

```sql

create external schema spectrum_schema from data catalog

database 'dd-redshift-database'

iam_role 'arn:aws:iam::${aws_account_id}:role/dd-redshift'

create external database if not exists;

```

Finally, do your queries. As an example, let's count the `contacts` we have in the `dd-redshift-database`

```sql

SELECT count(*) FROM spectrum_schema.contacts;

```

How did we do?

(opens in a new tab)

Connect your Dreamdata data to Amazon Redshift using AWS Glue

How did we do?

Related Articles Data retention and deletion

Contact

Related Articles

Data retention and deletion