What is Dreamdata?

Intro to Dreamdata

Setting Up Dreamdata

Understanding: Single sign-on and SAML

How to invite your colleagues to Dreamdata

How to validate your data

How to set up stage models

Shared Reports

Real-Time View

Events report

Pages report

Segmentation

Ad Spend

Signal Impact Report

Setup Content Reporting

Content Performance - Dashboard Options

What KPI should you use to measure the effect of B2B content?

Measuring influenced pipeline for B2B content - the true conversion metric

What content generates pipeline?

Which channel performs best for different content?

Overview

Return on Ads Spend (ROAS)

Google Search Ads

Google Display Ads

LinkedIn Ads

Microsoft Ads

Facebook Ads

YouTube Ads

Capterra Ads

Google Search

Organic

Acquisition

ROI

Performance vs. Revenue attribution: A guide on when to use what

AI-Generated Report Summary

Dreamdata Reveal

Engagement Score

Company Journey report

Deals

Dreamdata Report for LinkedIn Engagement

Search for companies or contacts

Funnel Stages Report

Link to the Customer Journey from your CRM

Content Analytics - Dashboard Options

Which content influenced the MQLs created in a time period?

Revenue Reporting

Revenue Segmentation

Revenue Attribution

Attribution Models- dashboard explanation

Customer Acquisition Cost (CAC) report

Time to Value Report

Evaluate how G2 Influences your Business

Overview

Web Traffic

Ad Performance

Ad Budget

Journey Metrics

Activating Signals with Audiences

AI Signal Recommendation Agent

Signals

Microsoft Customer List

Google Ads Customer Match

Meta Audiences

LinkedIn Matched Audiences

LinkedIn Conversions (CAPI)

Microsoft Enhanced Conversions

Meta Conversions

Optimizing Google Ads with Dreamdata: Which Stages to Feed Enhanced Conversions?

Google Ads Enhanced Conversion For Leads

Google Ads Enhanced Conversion: Salesforce vs Dreamdata

Google Ads Enhanced Conversion: Hubspot vs Dreamdata

Syncs - Automation

Syncs - Data Privacy

Understanding Data Privacy for B2B Advertising: Consent for Conversions and Audiences

Webhook syncs

The Dreamdata Chrome Extension

Slack Notifications

Microsoft Teams Notifications

Audience Builder

Audiences: Hubspot vs Dreamdata

Setting up Meta Ads

Paid sources: Overview

Setting up NextRoll

Setting up X (Twitter) Ads

Setting up Microsoft Ads

Setting up LinkedIn Ads

Setting up Google Ads

Setting up LinkedIn Lead Ads access & permissions

Setting up G2

Setting up Google Search

Setting Up LinkedIn Enhanced Engagement

Setting up Capterra

Setting up Microsoft Dynamics

Setting up Salesforce

Parent and Child Account Relationships

Setting up Pipedrive

Setting up HubSpot

Setting up Salesforce Marketing Cloud Account Engagement (Pardot)

Setting up Salesforce Marketing Cloud - Early Access

Setting up Marketo

Setting up Oracle Eloqua - Early Access

Import Customer Acquisition Cost data using Google Sheet

Import Events data using Google Sheet

Import ROI Cost Data using Google Sheet

Upload custom Stage Objects

Upload custom ROI Cost or CAC data

Upload custom Events and Web Tracking

Upload Custom CRM data

Custom Data Upload

Zapier Use Cases

Setting up Zapier integration & Zaps for Lead Ads

Setting up SafeBase Integration

Setting up Outreach

Setting up Intercom

Types of Attribution Models

Data-Driven Attribution

Custom Attribution Models

Attribution Exclusions

LinkedIn Impression Attribution

Creating Attribution Models

Setup Guide: All Salesforce Opportunities entering specific Stage

Setup Guide: All Microsoft Dynamics Opportunities in a specific Stage

Setup Guide: All Pipedrive Deals entering specific Stage

Stage Model Preview

Setup Guide: Creation of Opportunities/Deals

Setup Guide: Tracked sign-up events

How Dreamdata Handles Currency Exchange in Stage Models

Setup Guide: All Hubspot Deals entering specific Stage

Stage Model documentation

Data Hub

Understanding: How to map UTMs in Dreamdata

Event Builder: Create additional events in Dreamdata

Importing Historical Web Tracking Data into Dreamdata with the Event Builder

Event Builder: Best Practices

Data Modelling Schedule

Google BigQuery V2

Snowflake Schema V3

AWS S3 V2

Data Warehouse Schema

Connect your Dreamdata data to Snowflake

Setting up Data Export to BigQuery of CRM Properties

Build your own Revenue Attribution report in BigQuery

Streamline Your Revenue Analysis: Visualize all your revenue data in one place by using BigQuery Export

Google Bigquery Export - Why can't I see or query the data?

Free Datasets

Snowflake

Google BigQuery Legacy

AWS S3 Legacy

Automatically create Accounts not in your CRM

How to share Signals with your Sales team

What is Reverse ETL?

Guides for Looker Studio Reporting

Getting Started with Looker Studio Templates

Google Connected Sheets

Connect Dreamdata to Tableau

Overview

Company Data Enrichment

Working with multiple currencies

Dreamdata without connecting a CRM

Importing Historical Web Tracking Data into Dreamdata

Menu: Settings

Allowed Domains

Learn more about the 'Ad Account' filter

Learn more about the 'Branded Search' filter

Setting up B2B Benchmarks

CRM-Based Channel and Source in the Absence of Tracking Activity

CRM filters

Understanding: Unspecified

Understanding: Conversions

Understanding: Unknown

Understanding: Monthly Tracked User (MTU)

Understanding: Source, channel and event

Understanding: session

Understanding: Referrer

Company Logo

Understanding changes in historic reporting of attribution

How Dreamdata Maps Contacts to Companies

Why does my Linkedin campaign performance show 0 Opps?

Understanding the difference: Funnel Stages vs Time to Value reports

What is a company?

Understanding: Anonymous

Data retention and deletion

How is anonymous traffic linked to companies?

Why am I seeing gaps in Segmentation report data?

Can I connect multiple CRM's?

Can I update my company details?

Can I exclude content or websites from being tracked?

Understanding: Influenced vs Attributed Leads and Value

Understanding the Difference: Conversions vs. Stages

What does Visitors, Contacts and Companies mean?

How do we connect stage models

Roles and Permissions

Understanding: First party vs. third party cookies

Benchmarks FAQs

What is the reporting Time Zone?

Why are my dashboards empty?

Why am I seeing more sessions than page views?

Welcome Partner!

Ideal Customer Profile

Our Partner Tiers

Partner Advantages

Referral Guide and UTM tracking

Our Partner Material

Agency Partners - Contact Us

Understanding: UTM mapping rules

E-Commerce Order and Subscription Tracking in Dreamdata

Version 2 documentation

Setting up tracking with Segment

Sending partial data to Dreamdata

Tracking Bing Ads

Tracking Google Ads

Tracking Meta Ads

Tracking LinkedIn Ads

All Categories > Data Platform > Data access > Data Warehouse > Connect to AWS Redshift using AWS Glue

Connect to AWS Redshift using AWS Glue

This document describes how to integrate your Dreamdata data with your AWS Redshift cluster. Once connected, you can run your own queries on our data models, as well as copy, manipulate, join and use the data within other tools connected to Redshift.

This solution relies on AWS Glue. AWS Glue is a service that can act as a middle layer between an AWS s3 bucket and your AWS Redshift cluster.

Steps

Pre-requisites
Transfer to s3 bucket
Configure AWS Glue
Run AWS Glue crawler
Configure AWS Redshift
Query from AWS Redshift

Pre-requisites

AWS

AWS account ID
AWS redshift cluster

Google Cloud Platform

GCP Service Account in a project with Cloud Billing enabled

Dreamdata

Dreamdata Google Cloud Storage destination enabled
Give your Google Cloud Storage service account permissions to access Dreamdata's Google Cloud Storage bucket. Learn more here.

With all pre-requisites done, you should be able to fill the following variables (which will be used throughout the integration)

aws_account_id=
redshift_cluster_id=
service_account=
gcs_name=

Don't forget to sign in to AWS and GCP from your terminal

aws configure
gcloud auth login --update-adc

Transfer to s3 bucket

First create an AWS s3 bucket to store the data that is currently located in Dreamdata's premises on BigQuery.

Create your s3 bucket

aws s3api create-bucket --bucket ${s3_name}

Transfer all the data from Dreamdata's Google Cloud Storage to your newly created s3 bucket

gsutil -i ${service_account} -m rsync -rd "gs://${gcs_name}" "s3://${s3_name}"

Notice that in order for the transfer to work, the service_account in question must
have access to Dreamdata's bucket (how to) and be attached to a Google Cloud Platform project with Cloud Billing enabled

Configure AWS Glue

AWS Glue will act as a layer in between your AWS s3 bucket, currently hosting the data, and your AWS Redshift cluster. We will define a AWS Glue database that can be queried from AWS Redshift. Also, in order to move the data from the s3 bucket to the newly created AWS Glue database, we will use a AWS Glue crawler.

First, create a role that can be assumed by AWS Glue

glue-role.json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

aws iam create-role \
  --role-name dd-glue \
  --assume-role-policy-document file://glue-role.json

Then, attach the following policies

aws iam attach-role-policy \
  --role-name dd-glue \
  --policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole

aws iam attach-role-policy \
  --role-name dd-glue \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess

Create a AWS Glue database

aws glue create-database \
  --database-input "{\"Name\":\"dd-glue-database\"}"

Finally, create a AWS Glue crawler

aws glue create-crawler \
  --name dd-crawler \
  --role dd-glue \
  --database-name dd-glue-database \
  --targets "{\"S3Targets\": [{\"Path\": \"s3://${s3_name}/contacts\"}, {\"Path\": \"s3://${s3_name}/companies\"},  {\"Path\": \"s3://${s3_name}/events\"}]}"

Note that the crawler needs to know which tables you want to extract from the data that is currently located in your s3 bucket. In this example, find a crawler that extracts the tables contacts, companies, and events

Run AWS Glue crawler

At this point, AWS Glue is ready to deposit the data that is hosted in your s3 bucket in the AWS Glue database.

To populate the newly created AWS Glue database with fresh data

aws glue start-crawler \
  --name dd-crawler

Remember that you'll need to trigger the command just above every time you want to have fresh data in the AWS Glue database. Otherwise, see trigger crawlers with crons documentation.

Configure AWS Redshift

In order to run queries from your AWS Redshift cluster to the data now located in AWS Glue database, you will use AWS Redshift Spectrum, which is already a part of your AWS Redshift.

Create a role that will be assumed by your AWS Redshift cluster

redshift-role.json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "redshift.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}

aws iam create-role \
  --role-name dd-redshift \
  --assume-role-policy-document file://redshift-role.json

Attach minimum needed policies,

spectrum-policy.json

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetBucketLocation",
        "s3:GetObject",
        "s3:ListMultipartUploadParts",
        "s3:ListBucket",
        "s3:ListBucketMultipartUploads"
      ],
      "Resource": [
        "arn:aws:s3:::${s3_name}"
      ]
    },
    {
      "Effect": "Allow",
      "Action": [
        "glue:CreateDatabase",
        "glue:DeleteDatabase",
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:UpdateDatabase",
        "glue:CreateTable",
        "glue:DeleteTable",
        "glue:BatchDeleteTable",
        "glue:UpdateTable",
        "glue:GetTable",
        "glue:GetTables",
        "glue:BatchCreatePartition",
        "glue:CreatePartition",
        "glue:DeletePartition",
        "glue:BatchDeletePartition",
        "glue:UpdatePartition",
        "glue:GetPartition",
        "glue:GetPartitions",
        "glue:BatchGetPartition"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}

aws iam create-policy \
  --policy-name dd-spectrum-redshift \
  --policy-document file://spectrum-policy.json

aws iam attach-role-policy \
  --role-name dd-redshift \
  --policy-arn "arn:aws:iam::${aws_account_id}:policy/dd-spectrum-redshift"

And, finally, add the new role to the list of roles that can be used within your cluster

aws redshift modify-cluster-iam-roles \
  --cluster-identifier ${redshift_cluster_id} \
  --add-iam-roles "arn:aws:iam::${aws_account_id}:role/dd-redshift"

Query from AWS Redshift

Almost done. The only thing left to query data from your AWS Redshift Cluster is to create an external schema for AWS Redshift Spectrum. To do so, open your preferred AWS Redshift query editor.

Create an external schema for Amazon Redshift Spectrum

create external schema spectrum_schema from data catalog
  database 'dd-redshift-database'
  iam_role 'arn:aws:iam::${aws_account_id}:role/dd-redshift'
  
create external database if not exists;

You are now ready to run queries. As an example, try counting the contacts

SELECT count(*) FROM spectrum_schema.contacts;

How did we do?

(opens in a new tab)

Connect to AWS Redshift using AWS Glue

Steps

Pre-requisites

Transfer to s3 bucket

Configure AWS Glue

Run AWS Glue crawler

Configure AWS Redshift

Query from AWS Redshift

How did we do?

Related Articles Data Warehouse Schema Connect Dreamdata to Tableau Connect your Dreamdata data to Snowflake

Contact

Related Articles

Data Warehouse Schema

Connect Dreamdata to Tableau

Connect your Dreamdata data to Snowflake