Connect your Dreamdata data to Amazon Redshift

Updated by Steen Voersaa

By default, at Dreamdata we build your database on Google BigQuery. However, through our Google Cloud Storage Integration you can connect the raw dataset of your Dreamdata database and easily connect it to Amazon S3 and then on to Redshift.

Google Cloud Platform

To use Google Cloud Storage you first need an Google Cloud Platform account.

Dreamdata is an Google Cloud Partner, and if you have not already signed up to Google Cloud Platform you can do it here and get $500 of free credits. If you are not familiar with Google Cloud Storage and want to learn more, take a look at these how-to guides.

Dreamdata Google Cloud Partner

After you have acquired a Google Cloud Platform account, go here to set up the necessary access integration details. Once the integration is configured, we will begin to generate your raw datasets on 6-hour schedule.

Some key points:

  • This data will be hosted in the region of your choosing
  • Dreamdata will pay the data storage costs
  • You will pay for exporting costs

We will want to know the following:

  1. The region/zone of your Google Cloud Platform.
  2. You Google Cloud Platform service account.
  3. A list of emails of users that should have access to the exported dataset.
  4. A list of Google group email addresses that should have access to the dataset.

Note: Either (3) or (4) is required.

Exporting your data to AWS

After you have configured the Google Cloud Storage integration and gained access to your files, any one of the following methods can be used to transfer data from Google Cloud Storage to AWS S3 from where it can be loaded into Redshift.

  • AWS Glue: This article describes how to query data from AWS Redshift using AWS Glue.
  • Amazon EMR: This article describes how to copy the raw files from Google Cloud Storage directly to AWS S3 using Amazon EMR and how to run the task on a schedule.
  • Apache Airflow: This article describes how to programmatically author, schedule and monitor workflows.
  • CLI: This article describes how to connect data using CLI (an approach will work between any two cloud storage providers).

Once connected, you can run your own queries on our data models, as well as copy, manipulate, join and use the data within other tools connected to Redshift.

To connect Dreamdata to Amazon Redshift you need to be on Dreamdata's Business plan.

How did we do?