Get started with Dreamdata
What is Dreamdata? [VIDEO]
Setting Up Dreamdata
How to set up Dreamdata Web tracking (analytics.js) manually
How to set up up Dreamdata web tracking (analytics.js) using Google Tag Manager
How to track forms adding the auto-identify script via Google Tag Manager.
The Onboarding Process
Onboarding process for free customers [VIDEO]
Onboarding for paying customers [VIDEO]
Single sign-on
How to invite your colleagues to Dreamdata
Dashboards
Home
Engagement
Performance
Content Performance
Analytics
Which channel performs best for different content?
Which content influenced the MQLs created in a time period?
Content Performance - Dashboard Options
Which content generates pipeline?
Measuring influenced pipeline for B2B content - the true conversion metric
What KPI to measure the effect of B2B content?
Setup Content Reporting
Content Analytics - Dashboard Options
Paid
Overview
Ad Spend
Return on Ads Spend
Google Search Ads
Google Display Ads
LinkedIn Ads
Bing Ads
Facebook Ads
YouTube Ads
Capterra Ads
G2 Crowd
Organic
Acquisition
Conversions
Web Traffic
Performance vs. Revenue attribution: A guide on when to use what
Journeys
Revenue Analytics
Data Platform
Sources
Intent Data
Paid
Setting up Facebook Ads
Setting up AdRoll
Setting up Twitter Ads
Setting up Bing Ads
Setting up LinkedIn Ads
Setting up Google Ads
Setting up G2
Setting up Capterra
CRM
Marketing Automation
Import Other
Import fixed cost data using Google Sheet
Importing data not in your CRM (using Google Sheets)
Import cost data using Google Sheet
Custom data
Setting up Outreach
Setting up Zapier integration & Zaps for Lead Ads
Setting up Google Search
Setting up Intercom
Destinations
Ad Network
Data Warehouse
Google BigQuery
Connect to AWS Redshift using AWS Glue
Connect your Dreamdata data to Snowflake
Connect your Dreamdata data to Amazon Redshift
Cloud Storage
Business Intelligence
Guides for Looker Studio Reporting
Getting Started with Looker Studio Templates
Google Connected Sheets
Setting up Data Export to BigQuery of CRM Properties
Overview
Company Data Enrichment
Table Schema
Intent data
General Settings
Tracking
Segment
Dreamdata Cookies
Form Tracking
Reduce impact from ad-blockers and Apple ITP 2.x
Pardot iframe form tracking
Anonymizing IP
Tracking using Sleeknote or Drift
How to track your emails?
Cookie Bar
How does Dreamdata track all relevant on-site customer data?
Cookie Retention
Tracking Hubspot Forms with auto-identify script
Server Side Analytics APIs
URL query parameters
Calendly
Tracking iframes with auto-identify script
Tracking SPAs (Single Page Applications)
Across domain & device tracking
Advanced Identification of users and companies
Stage Models
Setup Guide: All Salesforce Opportunities entering specific Stage
Setup Guide: Creation of Opportunities/Deals
Setup Guide: Tracked sign-up events
Setup Guide: All Hubspot Deals entering specific Stage
Stage Models - Customization
Menu: Settings
Allowed Domains
UTM Mapping
Branded Search filter
CRM-Based Channel and Source in the Absence of Tracking Activity
CRM filters
FAQ
Glossary
Categories
Some of my deals are flagged with "Unknown". What does it mean?
Data retention
Can I exclude content or websites from being tracked?
How are activity mapped to companies?
What does Visitors, Contacts and Companies mean?
What is a session?
Roles and Permissions
What is the reporting Time Zone?
Why are my dashboards empty?
Agency Partners
Welcome Partner!
Partner Tiers
Referral Guide and UTM tracking
Partner Material
Intro Template for your new clients
Ideal Customer Profile
Co-marketing and account mapping
Contact
Quick learning videos!
Find the content that generates most pipeline
How to set up content categories on Dreamdata
Do you know how your company is generating money?
How to see the value of B2B Google Ads in pipeline and revenue generated
How Content Analytics tracks the influence of content of pipeline and revenue
How to easily build a retargeting audience with Dreamdata
Are you using G2?
Performance vs. Revenue Analytics reports- when to apply them best!
What attribution really is and why you should care!
Helping BDRs break through to the hottest accounts
Dreamdata Content Analytics: Discover the real value of your content
Do you know which of your Marketing activities had the biggest impact on pipeline and revenue?
Which of your emails produce pipeline and revenue?
See the value of SEO in pipeline and revenue generated
How to cut the cost of your Google Search Ads
- All Categories
- Data Platform
- Destinations
- Data Warehouse
- Connect to AWS Redshift using AWS Glue
Connect to AWS Redshift using AWS Glue
Updated
by Miquel
This document describes how to integrate your Dreamdata data with your AWS Redshift cluster. Once connected, you can run your own queries on our data models, as well as copy, manipulate, join and use the data within other tools connected to Redshift.
This solution relies on AWS Glue. AWS Glue is a service that can act as a middle layer between an AWS s3 bucket and your AWS Redshift cluster.
Steps
- Pre-requisites
- Transfer to s3 bucket
- Configure AWS Glue
- Run AWS Glue crawler
- Configure AWS Redshift
- Query from AWS Redshift
Pre-requisites
AWS
- AWS account ID
- AWS redshift cluster
Google Cloud Platform
- GCP Service Account in a project with Cloud Billing enabled
Dreamdata
- Dreamdata Google Cloud Storage destination enabled
- Give your Google Cloud Storage service account permissions to access Dreamdata's Google Cloud Storage bucket. Learn more here.
With all pre-requisites done, you should be able to fill the following variables (which will be used throughout the integration)
aws_account_id=
redshift_cluster_id=
service_account=
gcs_name=
aws configure
gcloud auth login --update-adc
Transfer to s3 bucket
First create an AWS s3 bucket to store the data that is currently located in Dreamdata's premises on BigQuery.
Create your s3 bucket
aws s3api create-bucket --bucket ${s3_name}
Transfer all the data from Dreamdata's Google Cloud Storage to your newly created s3 bucket
gsutil -i ${service_account} -m rsync -rd "gs://${gcs_name}" "s3://${s3_name}"
service_account
in question musthave access to Dreamdata's bucket (how to) and be attached to a Google Cloud Platform project with Cloud Billing enabled
Configure AWS Glue
AWS Glue will act as a layer in between your AWS s3
bucket, currently hosting the data, and your AWS Redshift
cluster. We will define a AWS Glue database
that can be queried from AWS Redshift. Also, in order to move the data from the s3 bucket to the newly created AWS Glue database, we will use a AWS Glue crawler
.
First, create a role that can be assumed by AWS Glue
glue-role.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "glue.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
aws iam create-role \
--role-name dd-glue \
--assume-role-policy-document file://glue-role.json
Then, attach the following policies
aws iam attach-role-policy \
--role-name dd-glue \
--policy-arn arn:aws:iam::aws:policy/service-role/AWSGlueServiceRole
aws iam attach-role-policy \
--role-name dd-glue \
--policy-arn arn:aws:iam::aws:policy/AmazonS3ReadOnlyAccess
Create a AWS Glue database
aws glue create-database \
--database-input "{\"Name\":\"dd-glue-database\"}"
Finally, create a AWS Glue crawler
aws glue create-crawler \
--name dd-crawler \
--role dd-glue \
--database-name dd-glue-database \
--targets "{\"S3Targets\": [{\"Path\": \"s3://${s3_name}/contacts\"}, {\"Path\": \"s3://${s3_name}/companies\"}, {\"Path\": \"s3://${s3_name}/events\"}]}"
contacts
, companies
, and events
Run AWS Glue crawler
At this point, AWS Glue is ready to deposit the data that is hosted in your s3 bucket in the AWS Glue database.
To populate the newly created AWS Glue database with fresh data
aws glue start-crawler \
--name dd-crawler
Configure AWS Redshift
In order to run queries from your AWS Redshift cluster to the data now located in AWS Glue database, you will use AWS Redshift Spectrum, which is already a part of your AWS Redshift.
Create a role that will be assumed by your AWS Redshift cluster
redshift-role.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "redshift.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
aws iam create-role \
--role-name dd-redshift \
--assume-role-policy-document file://redshift-role.json
Attach minimum needed policies,
spectrum-policy.json
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetBucketLocation",
"s3:GetObject",
"s3:ListMultipartUploadParts",
"s3:ListBucket",
"s3:ListBucketMultipartUploads"
],
"Resource": [
"arn:aws:s3:::${s3_name}"
]
},
{
"Effect": "Allow",
"Action": [
"glue:CreateDatabase",
"glue:DeleteDatabase",
"glue:GetDatabase",
"glue:GetDatabases",
"glue:UpdateDatabase",
"glue:CreateTable",
"glue:DeleteTable",
"glue:BatchDeleteTable",
"glue:UpdateTable",
"glue:GetTable",
"glue:GetTables",
"glue:BatchCreatePartition",
"glue:CreatePartition",
"glue:DeletePartition",
"glue:BatchDeletePartition",
"glue:UpdatePartition",
"glue:GetPartition",
"glue:GetPartitions",
"glue:BatchGetPartition"
],
"Resource": [
"*"
]
}
]
}
aws iam create-policy \
--policy-name dd-spectrum-redshift \
--policy-document file://spectrum-policy.json
aws iam attach-role-policy \
--role-name dd-redshift \
--policy-arn "arn:aws:iam::${aws_account_id}:policy/dd-spectrum-redshift"
And, finally, add the new role to the list of roles that can be used within your cluster
aws redshift modify-cluster-iam-roles \
--cluster-identifier ${redshift_cluster_id} \
--add-iam-roles "arn:aws:iam::${aws_account_id}:role/dd-redshift"
Query from AWS Redshift
Almost done. The only thing left to query data from your AWS Redshift Cluster is to create an external schema for AWS Redshift Spectrum. To do so, open your preferred AWS Redshift query editor.
Create an external schema for Amazon Redshift Spectrum
create external schema spectrum_schema from data catalog
database 'dd-redshift-database'
iam_role 'arn:aws:iam::${aws_account_id}:role/dd-redshift'
create external database if not exists;
You are now ready to run queries. As an example, try counting the contacts
SELECT count(*) FROM spectrum_schema.contacts;