You might not have access to this feature!

This feature is only available on our premium and enterprise plans. [Talk to our team](mailto:premium@customer.io) about upgrading your plan.

# Storage data warehouse integrations

[PremiumThis feature is available for Premium plans.](/accounts-and-workspaces/plan-features/) [EnterpriseThis feature is available for Enterprise plans.](/accounts-and-workspaces/plan-features/)

Our data warehouse integrations let you send Customer.io data about messages, people, metrics, etc to your data warehouse by way of an Amazon S3 or Google Cloud Project (GCP) storage bucket. From there, you can ingest the data to your warehouse.

You can find our data warehouse integrations by going to **Data & Integrations** > **Integrations** and selecting the **Databases** option. From here, you can select your data warehouse or storage bucket. If your data warehouse appears twice in this list, pick the **Data out** integration. Check out the specific documentation for your data warehouse for help setting up your integration.

[![On the left hand side, the icon for Journeys, a paper plane, is selected. On the panel to the right of this, the icon for Data & Integrations, three stacked cylinders, is selected. The page Integrations is selected. On the right hand side, the filters All types and Databases are selected. In the middle of the screenshot is a page titled All integrations. There is a 3 x 3 grid view of database integrations like Amazon Redshift. If you were to scroll down, you'd see more options.](https://docs.customer.io/images/database-integrations-page.png)](#1f1bd36a455857143928405b314a8093-lightbox)

## How it works[](#how-it-works)

This integration exports individual [parquet](https://parquet.apache.org/) files for **Deliveries**, **Metrics**, **Subjects**, **Outputs**, **Content**, **People**, and **Attributes** to your storage bucket. Each parquet file contains data that changed since the last export.

Once the parquet files are in your storage bucket, you can import them into data platforms like Fivetran or data warehouses like Redshift, BigQuery, and Snowflake.

Note that this integration only publishes parquet files to your storage bucket. You must set your data warehouse to ingest this data. There are many approaches to ingesting data, but it typically requires a COPY command to load the parquet files from your bucket. After you load parquet files, you should set them to expire to delete them automatically.

We attempt to export parquet files every 15 minutes, though actual sync intervals and processing times may vary. When syncing large data sets, or Customer.io experiences a high volume of concurrent sync operations, it can take up to several hours to process and export data. This feature is not intended to sync data in real time.

 Your initial sync includes historical data

During the first sync, you’ll receive a history of your Deliveries, Metrics, Subjects, and Outputs data. However, People who have been deleted or suppressed before the first sync are **not** included in the People file export and the historical data in the other export files is anonymized for the deleted and suppressed People.

### The initial export vs incremental exports[](#initial-sync)

Your initial sync is a set of files containing historical data to represent your workspace’s current state. Subsequent sync files contain changesets.

*   **Metrics**: The initial metrics sync is broken up into files with two sequence numbers, as follows. `<name>_v5_<workspace_id>_<sequence1>_<sequence2>`.
*   **Attributes**: The initial Attributes sync includes a list of profiles and their current attributes. Subsequent files will only contain *attribute changes*, with one change per row.
*   **Events**: The initial events sync includes up to 30 days of past events. Subsequent files contain events since the previous sync interval. We cannot export events older than 30 days.

When you set up your sync, you can choose the parquet files that you want to export to your storage bucket. If you temporarily disable a file, and then turn it back on, the next parquet file will contain the changeset between when you disabled the file and when you enabled it.

For example, let’s say you’ve enabled the [Attributes export](#schemas). We will attempt to sync your data to your storage bucket every 15 minutes:

1.  12:00pm We sync your Attributes Schema for the first time. This includes a list of profiles and their current attributes.
2.  12:05pm User1’s email is updated to [company-email@example.com](mailto:company-email@example.com).
3.  12:10pm User1’s email is updated to [personal-email@example.com](mailto:personal-email@example.com).
4.  12:15 We sync your data again. In this export, you would only see attribute changes, with one change per row. User1 would have one row dedicated to his email changing.

## How do I get data into my data warehouse?[](#example-setup)

There are many approaches to ingesting data from your storage bucket, but here’s an example moving data from a Google Cloud Storage bucket to Google BigQuery.

*   Implement a Cloud Function to automatically import the parquet files from your GCS bucket to a BigQuery table.
*   Set an expiration on the parquet files so they’re automatically deleted.

Below is a screenshot of an example Cloud Function. Download sample code for this cloud function—[based on our v5](#schemas) schema—by clicking the link below:

[main.go](/files/main.go)

[![BigQuery Cloud Function set up](https://docs.customer.io/images/datawarehouse-out-bigquery-cloud-function.png)](#79f13ce13cefb007e90a9c557276d52a-lightbox)

Make sure you review the code and make appropriate modifications to fit your use case.

## Exported parquet files[](#schemas)

This section describes the different kinds of files you can export from our Database-out integrations. Many schemas include an `internal_customer_id`—this is the `[cio_idAn identifier for a person that is automatically generated by Customer.io and cannot be changed. This identifier provides a complete, unbroken record of a person across changes to their other identifiers (id, email, etc).](/identifying-people/#cio_id)`. You can use it to resolve a person associated with a subject, delivery, etc.

Object-related schemas include an `internal_object_id`—an internal ID that we assign to all objects. You’ll use this value with the *Objects* export to resolve the names and IDs of your objects.

### Deliveries schema[](#deliveries-schema)

Deliveries are individual email, in-app, push, SMS, slack, and webhook records sent from your workspace. The first deliveries export file includes baseline historical data. Subsequent files contain rows for data that changed since the last export.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the delivery record.

delivery\_id

✅

STRING (Required). The ID of the delivery record.

internal\_customer\_id

People

STRING (Nullable). The `cio_id` of the person in question. Use the `people` parquet file to resolve this ID to an external `customer_id` or email address.

subject\_id

Subjects

STRING (Nullable). If the delivery was created as part of a Campaign or API Triggered Broadcast workflow, this is the ID for the path the person went through in the workflow. **Note: This value refers to, and is the same as, the `subject_name` in the subjects table.**

event\_id

Subjects

STRING (Nullable). If the delivery was created as part of an event-triggered Campaign, this is the ID for the unique event that triggered the workflow. Note that this is a foreign key for the subjects table, and not the metrics table.

delivery\_type

STRING (Required). The type of delivery: email, push, in-app, sms, slack, or webhook.

campaign\_id

INTEGER (Nullable). If the delivery was created as part of a Campaign or API Triggered Broadcast workflow, this is the ID for the Campaign or API Triggered Broadcast.

action\_id

INTEGER (Nullable). If the delivery was created as part of a Campaign or API Triggered Broadcast workflow, this is the ID for the unique workflow item that caused the delivery to be created.

newsletter\_id

INTEGER (Nullable). If the delivery was created as part of a Newsletter, this is the unique ID of that Newsletter.

content\_id

INTEGER (Nullable). If the delivery was created as part of a Newsletter split test, this is the unique ID of the Newsletter variant.

trigger\_id

INTEGER (Nullable). If the delivery was created as part of an API Triggered Broadcast, this is the unique trigger ID associated with the API call that triggered the broadcast.

created\_at

TIMESTAMP (Required). The timestamp the delivery was created at.

transactional\_message\_id

INTEGER (Nullable). If the delivery occurred as a part of a transactional message, this is the unique identifier for the API call that triggered the message.

seq\_num

INTEGER (Required) A monotonically increasing number indicating relative recency for each record: the larger the number, the more recent the record.

### Delivery Content schema[](#delivery-content-schema)

The `delivery_content` schema represents message contents; each row corresponds to an individual delivery. Use the `delivery_id` to find more information about the contents of a message, or the `recipient` to find information about the person who received the message.

If your delivery was produced from a campaign, it’ll include campaign and action IDs, and the newsletter and content IDs will be null. If your delivery came from a newsletter, the row will include newsletter and content IDs, and the campaign and action IDs will be null.

**Delivery content might lag behind other tables by 15-30 minutes (or roughly 1 sync operation).** We package delivery contents on a 15 minute interval, and can export to your data warehouse up to every 15 minutes. If these operations don’t line up, we might occasionally export `delivery_content` after other tables.

 Delivery content can be a very large data set

Workspaces that have sent many messages may have hundreds or thousands of GB of data.

 Delivery content is available in v4 or later

The `delivery_content` schema was introduced in our v4 release. You need to [update your data warehouse schemas](/integrations/data-out/data-warehouses/data-warehouses-intro/#upgrade-to-v4) or later to take advantage of the update and see Delivery Content, Subjects, and Outputs.

Field Name

Primary Key

Foreign Key

Description

delivery\_id

✅

Deliveries

STRING (Required). The ID of that delivery associated with the message content.

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the output record.

type

STRING (Required). The delivery type—one of email, sms, push, in-app, or webhook.

campaign\_id

INTEGER (Nullable). The ID for the campaign that produced the content (if applicable).

action\_id

INTEGER (Nullable). The ID for the campaign workflow item that produced the content.

newsletter\_id

INTEGER (Nullable). The ID for the newsletter that produced the content.

content\_id

INTEGER (Nullable). The ID for the newsletter content, 0 indexed. If your newsletter did not include an A/B test or multiple languages, this value is 0.

from

STRING (Nullable). The from address for an email, if the content represents an email.

reply\_to

STRING (Nullable). The Reply To address for an email, if the content is related to an email.

bcc

STRING (Nullable). The Blind Carbon Copy (BCC) address for an email, if the content is related to an email.

recipient

STRING (Required). The person who received the message, dependent on the type. For an email, this is an email address; for an SMS, it's a phone number; for a push notification, it's a device ID.

subject

STRING (Nullable). The subject line of the message, if applicable; required if the message is an email

body

STRING (Required). The body of the message, including all HTML markup for an email.

body\_amp

STRING (Nullable). The HTML body of an email including any AMP-enabled JavaScript included in the message.

body\_plain

STRING (nullable). The plain text of an email message, without HTML tags or AMP content. This field is typically `null` *unless* you manually set or change the plain-text version of an email (the `body_plain` field when you use our APIs).

preheader

STRING (Nullable). "Also known as "preview text", this is the block block of text that users see next to, or underneath, the subject line in their inbox.

url

STRING (Nullable). If the delivery is an outgoing webhook, this is the URL of the webhook.

method

STRING (Nullable). If the delivery is an outgoing webhook, this is the HTTP method used—POST, PUT, GET, etc.

headers

STRING (Nullable). If the delivery is an outgoing webhook, these are the headers included with the webhook.

### Metrics schema[](#metrics-schema)

Metrics exports detail events relating to deliveries (e.g. messages sent, opened, etc). Your initial metrics export contains baseline historical data, broken up into files with two sequence numbers, as follows: `<name>_v5_<workspace_id>_<sequence1>_sequence2>`.

Subsequent files contain rows for data that changed since the last export.

 You might have multiple entries per delivery\_id

For example, person can click a link in a message multiple times, creating multiple “clicked” metrics. We might attempt a message delivery multiple times before it’s successfully sent, creating multiple “attempted” metrics. Depending on the metrics you care about, you might need to deduplicate or aggregate metrics based on the `delivery_id` to get correct counts.

Field Name

Primary Key

Foreign Key

Description

event\_id

✅

STRING (Required). The unique ID of the metric event. This can be useful for deduplicating purposes.

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the metric record.

delivery\_id

Deliveries

STRING (Required). The ID of the delivery record.

metric

STRING (Required). The type of metric (e.g. sent, delivered, opened, clicked).

reason

STRING (Nullable). For certain metrics (e.g. attempted), the reason behind the action.

link\_id

INTEGER (Nullable). For "clicked" metrics, the unique ID of the link being clicked.

link\_url

STRING (Nullable). For "clicked" metrics, the URL of the clicked link. (Truncated to 1000 bytes.)

created\_at

TIMESTAMP (Required). The timestamp the metric was created at.

seq\_num

INTEGER (Required) A monotonically increasing number indicating relative recency for each record: the larger the number, the more recent the record.

proxied

Boolean. For email `opened` metrics, this indicates that the open event originated from a proxy server. For example, a proxy server may record an open independently of a message reaching the user’s inbox. For other metrics, this is `false`.

prefetched

Boolean. For email `opened` metrics, this indicates that the metric was the result of prefetching and not necessarily a user action. For example, Gmail prefetches images to speed up rendering in the inbox, which may result in an opened metric—but the user didn’t actually open the email. For other metrics, this this value is false.

machine

Boolean. For email `clicked` metrics, it means that the click event originated a non-human, e.g. a security service or email-protection application clicked a link. For other metrics, this is `false`.

user\_agent

STRING (Nullable). The user agent string of the person (or machine) who performed the action, where available. If we don't have a user agent string, this value is null.

email\_client

STRING (Nullable). For email metrics, the email client related to the action; applies to metrics like opened, clicked, etc. For non email channels, this value is null.

inbox\_domain

STRING (Nullable). For email metrics, the inbox domain of the person who performed the action. If this value isn't discernable, or the metric is not email related, this value is null.

inbox\_provider

STRING (Nullable). For email metrics, the inbox provider of the person who performed the action. If this value isn't discernable, or the metric is not email related, this value is null.

mx\_host

STRING (Nullable). For email metrics, this is the MX host of the inbox (e.g. mailhost1.example.com). If this value isn't discernable, or the metric is not email related, this value is null.

### Subjects schema[](#subjects-schema)

Subjects are the unique workflow journeys that people take through Campaigns and API Triggered Broadcasts. The first subjects export file includes baseline historical data. Subsequent files contain rows for data that changed since the last export.

 Upgrade to v4 to use subjects and outputs

We’ve made some minor changes to subjects and outputs a part of our v4 release. If you’re using a previous schema version, we disabled your subjects and outputs on October 31st, 2022. You need to [upgrade](/integrations/data-out/data-warehouses/data-warehouses-intro/#upgrade-to-v4) to schema version 4 or later, to continue syncing outputs and subjects data.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the subject record.

subject\_name

✅

STRING (Required). A unique ID for the path a person took through a campaign or broadcast workflow.

internal\_customer\_id

People

STRING (Nullable). The `cio_id` of the person in question. Use the `people` parquet file to resolve this ID to an external `customer_id` or email address.

campaign\_type

STRING (Required). The type of Campaign (segment, event, or triggered\_broadcast)

campaign\_id

INTEGER (Required). The ID of the Campaign or API Triggered Broadcast.

event\_id

Metrics

STRING (Nullable). The ID for the unique event that triggered the workflow.

trigger\_id

INTEGER (Optional). If the delivery was created as part of an API Triggered Broadcast, this is the unique trigger ID associated with the API call that triggered the broadcast.

started\_campaign\_at

TIMESTAMP (Required). The timestamp when the person first matched the campaign trigger. For event-triggered campaigns, this is the timestamp of the trigger event. For segment-triggered campaigns, this is the time the user entered the segment.

created\_at

TIMESTAMP (Required). The timestamp the subject was created at.

seq\_num

INTEGER (Required) A monotonically increasing number indicating relative recency for each record: the larger the number, the more recent the record.

### Outputs schema[](#outputs-schema)

Outputs are the unique steps within each workflow journey. The first outputs file includes historical data. Subsequent files contain rows for data that changed since the last export.

 Upgrade to v4 to use subjects and outputs

We’ve made some minor changes to subjects and outputs a part of our v4 release. If you’re using a previous schema version, we disabled your subjects and outputs on October 31st, 2022. You need to [upgrade](/integrations/data-out/data-warehouses/data-warehouses-intro/#upgrade-to-v4) to schema version 4 or later, to continue syncing outputs and subjects data.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the output record.

output\_id

✅

STRING (Required). The ID for the step of the unique path a person went through in a Campaign or API Triggered Broadcast workflow.

subject\_name

Subjects

STRING (Required). A secondary unique ID for the path a person took through a campaign or broadcast workflow.

output\_type

STRING (Required). The type of step a person went through in a Campaign or API Triggered Broadcast workflow. Note that the “delay” output\_type covers many use cases: a Time Delay or Time Window workflow item, a “[grace period](/journeys/grace-periods/)”, or a date-based campaign trigger.

action\_id

INTEGER (Required). The ID for the unique workflow item associated with the output.

explanation

STRING (Required). The explanation for the output.

delivery\_id

Deliveries

STRING (Nullable). If a delivery resulted from this step of the workflow, this is the ID of that delivery.

draft

BOOLEAN (Nullable). If a delivery resulted from this step of the workflow, this indicates whether the delivery was created as a draft.

link\_tracked

BOOLEAN (Nullable). If a delivery resulted from this step of the workflow, this indicates whether links within the delivery are configured for tracking.

split\_test\_index

INTEGER (Nullable). If the step of the workflow was a Split Test, this indicates the variant of the Split Test.

delay\_ends\_at

TIMESTAMP (Nullable). If the step of the workflow involves a delay, this is the timestamp for when the delay will end.

branch\_index

INTEGER (Nullable). If the step of the workflow was a T/F Branch, a Multi-Split Branch, or a Random Cohort Branch, this indicates the branch that was followed.

manual\_segment\_id

INTEGER (Nullable). If the step of the workflow was a Manual Segment Update, this is the ID of the Manual Segment involved.

add\_to\_manual\_segment

BOOLEAN (Nullable). If the step of the workflow was a Manual Segment Update, this indicates whether a person was added or removed from the Manual Segment involved.

created\_at

TIMESTAMP (Required). The timestamp the output was created at.

seq\_num

INTEGER (Required) A monotonically increasing number indicating relative recency for each record: the larger the number, the more recent the record.

### People schema[](#people-schema)

The first People export file includes a list of current people at the time of your first sync (deleted or suppressed people are not included in the first file). Subsequent exports include people who were created, deleted, or suppressed since the last export.

People exports come in two different files:

*   `people_v5_<env>_<seq>.parquet`: Contains new people.
*   `people_v5_chngs_<env>_<seq>.parquet`: Contains changes to people since the previous sync.

These files have an identical structure and a part of the same data set. You should import them to the same table.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the person.

customer\_id

STRING (Required). The ID of the person in question. This will match the ID you see in the Customer.io UI.

internal\_customer\_id

✅

STRING (Required). The `cio_id` of the person in question. Use the `people` parquet file to resolve this ID to an external `customer_id` or email address.

deleted

BOOLEAN (Nullable). This indicates whether the person has been deleted.

suppressed

BOOLEAN (Nullable). This indicates whether the person has been suppressed.

created\_at

TIMESTAMP (Required). The date/time when the person was added to Customer.io (using the `_created_in_customerio_at` attribute). Note that this is [not necessarily the same as a person's `created_at` value](/journeys/segmentation-and-timestamp-rules/#dual-created-at)! If you import people from an external system, a CSV, or backdate the `created_at` value, this value is likely to be different from a person's `created_at` attribute.

Note that this value is 0 for deleted or suppressed people

updated\_at

TIMESTAMP (Required) The date-time when a person was updated. Use the most recent updated\_at value for a customer\_id to disambiguate between multiple records.

email\_addr

STRING (Optional) The email address of the person. For workspaces using email as a unique identifier, this value may be the same as the `customer_id`.

### Attributes schema[](#attributes-schema)

Attribute exports represent changes to people (by way of their attribute values) over time. The initial Attributes export includes a list of profiles and their current attributes. Subsequent files contain *attribute changes*, with one change per row.

For changes to nested attributes, like the [subscription preferences attribute](/journeys/manage-subscription-preferences/#backfill-preferences), the `attribute_name` will be the top-level attribute and the `attribute_value` returns the stringified JSON representing the nested changes. Using our subscription preferences example, the `attribute_name` would be `cio_subscription_preferences` and the `attribute_value` would be something like `"{\"topics\":{\"topic_7\":false,\"topic_8\":false}}"`.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the person.

internal\_customer\_id

✅

STRING (Required). The `cio_id` of the person in question. Use the `people` parquet file to resolve this ID to an external `customer_id` or email address.

attribute\_name

STRING (Required). The attribute that was updated.

attribute\_value

STRING (Required). The new value of the attribute.

timestamp

TIMESTAMP (Required). The timestamp of the attribute update.

### Campaigns schema[](#campaigns-schema)

When you enable the **Campaign Metadata** schema, we actually return two different tables: *Campaigns* and *Actions*. The *Campaigns* table returns the names and versions of your campaigns and API-triggered broadcasts. Some other tables—like *Deliveries* and *Subjects*—return campaign *ID* values. You can use this table to get campaign names based on those IDs so you can better understand exports related to campaigns.

Note that this table includes both **Campaigns** and **API-triggered broadcasts**; both have `campaign_id` values. **Newsletters** appear in the Broadcasts table with a `broadcast_id`.

With each sync, we’ll return the rows where the `version` changed. The `version` is a number that increments each time a campaign or API-triggered broadcast is updated. This way, you can keep your campaign names and versions up-to-date.

 Each row is an update

You’ll see a row for each update to each campaign or API-triggered broadcast. If joining to this table, you may want to include a condition so that you only get the MAX `updated_at` value for each `campaign_id` to get the most recent version.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace containing the campaign.

campaign\_id

✅

INTEGER (Required). The ID of the campaign or API-triggered broadcast. Note that newsletters appear in the Broadcasts schema with a \`broadcast\_id\`, not here.

name

STRING (Required). The name of a campaign. You set this in Customer.io when you create your campaign.

created\_at

TIMESTAMP (Required). The date-time (in milliseconds) when you created the campaign. You can create campaigns without activating them!

updated\_at

TIMESTAMP (Required) The date-time (in milliseconds) when a campaign was last updated.

version

INTEGER (Required) An incrementing number starting at 1 representing the “version” of the campaign. The largest version number represents the latest version of the campaign. Versions increment when you change the name, trigger, or goal of a campaign. See the *Actions* table for changes to messages and other items in your campaign workflow.

topic\_names

LIST (Nullable). If you use our subscription center feature, this value is a comma-separated list of subscription topics that users must be subscribed to in order to receive the campaign. If the campaign is not associated with any topics, this value is null.

### Broadcasts schema[](#broadcasts-schema)

The **Broadcasts** schema returns information about your newsletters. Note that API-triggered broadcasts appear in the **Campaigns** schema, not the Broadcasts schema. The initial sync returns all your newsletters. Subsequent syncs return only the newsletters that have changed since the last sync.

 Each row is an update

You’ll see a row for each update to each broadcast. For example, if you edit the content, audience, and settings for a broadcast, you’ll see three rows. If joining to this table, you may want to include a condition so that you only get the MAX `updated_at` value for each `broadcast_id` to get the most recent version.

 Broadcasts vs Campaigns

In the data warehouse schemas:

*   **Newsletters** appear in the Broadcasts schema with a `broadcast_id`
*   **API-triggered broadcasts** appear in the Campaigns schema with a `campaign_id`

This is why newsletters and API-triggered broadcasts can share the same ID value—they exist in different schemas.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace containing the broadcast.

broadcast\_id

✅

INTEGER (Required). The ID of the newsletter. Note that API-triggered broadcasts appear in the Campaigns schema with a \`campaign\_id\`, not here.

name

STRING (Required). The name of a broadcast. You set this in Customer.io when you create your broadcast.

created\_at

TIMESTAMP (Required). The date-time (in milliseconds) when you created the broadcast. You can create broadcasts without activating them!

updated\_at

TIMESTAMP (Required) The date-time (in milliseconds) when a broadcast was last updated.

topic\_names

LIST (Nullable). If you use our subscription center feature, this value is a comma-separated list of subscription topics that users must be subscribed to in order to receive the campaign. If the campaign is not associated with any topics, this value is null.

### Actions schema[](#actions-schema)

When you enable the **Campaign Metadata** schema, we actually return two different tables: *Campaigns* and *Actions*. The *Actions* table returns the names and versions of workflow steps in your campaigns, which we call actionsA block in a campaign workflow—like a message, delay, or attribute change.. Some other tables—like *Deliveries* and *Subjects*—return action *ID* values. You can use this table to get the names of actions in your campaigns, so it’s easier for you to understand your campaign and action-related data.

With each sync, we’ll return the rows where the `version` changed. The `version` is a number that increments each time a campaign is updated. This way, you can keep your understanding of campaign actions up-to-date.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace containing the workflow action.

campaign\_id

Campaigns

INTEGER (Required). The ID of the campaign containing the action.

action\_id

INTEGER (Required). The ID of the action.

name

STRING (Optional). The name of a workflow action. You set this in Customer.io when you create or edit your action. If you didn't set a name for the action, this field is empty.

created\_at

TIMESTAMP (Required). The date-time (in milliseconds) when you created the workflow action.

updated\_at

TIMESTAMP (Required) The date-time (in milliseconds) when a workflow action was last updated.

version

INTEGER (Required) An incrementing number starting at 1 representing the "version" of the workflow action. The largest number for any action represents the latest version. The version changes whenever you update the name, content, or settings of your workflow action.

topic\_names

LIST (Nullable). If you use our subscription center feature, this value is a comma-separated list of subscription topics that users must be subscribed to in order to receive the campaign. If the campaign is not associated with any topics, this value is null.

### Objects schema[](#objects-schema)

The first Object export file includes a list of current objects at the time of your first sync (deleted objects are not included in the first file). Subsequent exports include objects who were created, deleted, or suppressed since the last export.

**When you enable the *Objects* export, we also export *Object Types*.**

object exports come in two different files:

*   `object_v5_<env>_<seq>.parquet`: Contains new objects.
*   `object_v5_chngs_<env>_<seq>.parquet`: Contains changes to objects since the previous sync.

These files have an identical structure and a part of the same data set. You should import them to the same table.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the object.

object\_type\_id

Object Types

INTEGER (Required). Object type IDs begin at 1 and increase sequentially. For example, if you created objects call Accounts and Companies, in that order, they’d have object types `1` and `2` respectively.

object\_id

STRING (Required). The ID of the object in question. This will match the ID you see in the Customer.io UI.

internal\_object\_id

✅

STRING (Required). A unique, immutable ID that Customer.io assigns to the object. Other exports use this value in to reference your object; you can use this export to resolve internal IDs to your object IDs.

deleted

BOOLEAN (Nullable). This indicates whether the object has been deleted.

created\_at

TIMESTAMP (Required). The date/time when the object was added to your workspace.

updated\_at

TIMESTAMP (Required) The date-time when a object was updated. Use the most recent updated\_at value for an object\_id to disambiguate between multiple records.

### Object Types schema[](#object-types-schema)

**We export object types when you enable the *Objects* export.** All objects have a *type* indicating what kind of entity they are—like an *account* or *company*. The `object_type` value is an integer starting at 1. For example, if you create two types of objects in your system, accounts and companies, in that order, accounts have an `object_type` of `1` and companies have an `object_type` of `2`.

The first export includes a list of object types at the time of your first sync (we don’t include deleted types in the first file). Subsequent exports include types you created, updated, or deleted since the last sync.

object exports come in two different files:

*   `object_types_v5_<env>_<seq>.parquet`: Contains new object types.
*   `object_types_v5_chngs_<env>_<seq>.parquet`: Contains changes to object types since the previous sync.

These files have an identical structure and a part of the same data set. You should import them to the same table.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the object.

object\_type\_id

✅

INTEGER (Required). Object type IDs begin at 1 and increase sequentially. For example, if you created objects call Accounts and Companies, in that order, they’d have object types `1` and `2` respectively.

name

STRING (Required). The name of the object type, like "Accounts" or "Companies."

slug

STRING (Required). The value you use to reference objects of this type with [liquidA syntax that supports variables, letting you personalize messages for your audience. For example, if you want to reference a person’s first name, you might use the variable `{{customer.first_name}}`.](/using-liquid). For example, if your object type is *Accounts*, you’ll typically reference objects using `{{objects.accounts}}`.

deleted

BOOLEAN (Required). If true, the object type has been deleted.

enabled

BOOLEAN (Required). If true, the object type is enabled. You can’t use disabled object types in segments, messages, and so on. [Learn more](/journeys/object-types/#enabledisable-an-object-type)

updated\_at

TIMESTAMP (Required). The date and time the object type was last updated.

### Object Attributes schema[](#object-attributes-schema)

Object attribute exports contain changes to object [attributeA key-value pair that you associate with a person or an object—like a person’s name, the date they were created in your workspace, or a company’s billing date etc. Use attributes to target people and personalize messages.](/journeys/attributes/). The initial export includes a list of your current objects and their attributes. Subsequent files contain changes to object attributes, with one change per row.

If your object attributes contain nested JSON, the `attribute_name` is the top-level attribute and the `attribute_value` returns the stringified JSON for that attribute.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the person.

object\_type\_id

Object Types

INTEGER (Required). The type of the object represented by the `internal_object_id`. Object type IDs begin at 1 and increase sequentially. For example, if you created objects call Accounts and Companies, in that order, they’d have object types `1` and `2` respectively.

internal\_object\_id

✅

Objects

STRING (Required). A unique, immutable ID that Customer.io assigns to the object. You can resolve this value to the object name or ID you’re familiar with from the associated *Objects* export.

attribute\_name

STRING (Required). The attribute that changed.

attribute\_value

STRING (Required). The new value of the attribute.

timestamp

TIMESTAMP (Required). The timestamp of the attribute update.

### Events schema[](#events-schema)

Events are the things people do in your app, on your website, etc. The Events export includes a list of events that people have triggered, with one event per row. Each event includes an `internal_customer_id` that you can use in conjunction with the People table to resolve a person’s `customer_id` or email address.

**The initial sync includes up to 30-days of past events.** Subsequent files contain events since the previous sync interval. We cannot backfill events older than 30 days.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the person.

event\_id

✅

STRING (Required). The ID of the event, which may be useful if you need to dedupe events.

internal\_customer\_id

People

STRING (Required). The `cio_id` of the person who performed the event. Use the `people` parquet file to resolve this ID to an external `customer_id` or email address.

name

STRING (Required). The event name.

type

STRING (Required). One of `event`, `page`, or `screen`; `page` and `screen` represent page and screenviews respectively. The `event` value represents any other kind of event.

data

STRING (Required). A stringified object containing the event `properties`—the event payload aside from the name, timestamps, and ID.

timestamp

TIMESTAMP (Required). The Unix timestamp associated with the event. If you don't set this value yourself, this is the date-time when Customer.io received the event.

processed\_at

TIMESTAMP (Required). The Unix time when Customer.io processed the event.

sources

ARRAY of STRINGS (Required). The source(s) of the event, e.g. `Customer.io Data Pipelines via JavaScript`.

source\_uas

ARRAY of STRINGS (Required). The user agent source(s) of the event, e.g. `Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:132.0) Gecko/20100101 Firefox/132.0`.

### Inbound (SMS) schema[](#inbound-sms-schema)

You’ll only see the option to enable this schema if you send SMS through Customer.io.

When someone replies to an SMS message you sent, we record an `inbound` event. The “inbound” export contains one row for each inbound SMS message you receive between syncs. Each event includes an `internal_customer_id` that you can use in conjunction with the People table to resolve a person’s `customer_id` or email address.

**The initial sync includes up to 30-days of past inbound events.** Subsequent files contain events since the previous sync interval. We cannot backfill events older than 30 days.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the inbound message.

event\_id

✅

STRING (Required). The unique event identifier, which may be useful if you need to dedupe events.

internal\_customer\_id

People

STRING (Required). The `cio_id` of the person who sent the message. Use the `people` parquet file to resolve this ID to an external `customer_id` or email address.

timestamp

TIMESTAMP (Required). The Unix timestamp when the person sent the inbound message.

processed\_at

TIMESTAMP (Required). The Unix timestamp when Customer.io processed the event.

channel

STRING (Required). The messaging channel (e.g., "sms").

from

STRING (Required). The phone number the person sent the inbound message from.

to

STRING (Required). The phone number the person replied to.

body

STRING (Required). The content of the inbound message.

keyword

STRING (Required). The keyword detected in the message, if any.

optout

BOOLEAN (Required). If true, the message was an opt-out request; if false, it was not.

messaging\_service\_sid

STRING (Required). The messaging service identifier from the SMS provider.

message\_sid

STRING (Required). The unique message identifier from the SMS provider.

in\_reply\_to\_delivery\_id

Deliveries

STRING (Required). The delivery ID of the message this inbound message is replying to, if available. We match inbound messages to deliveries within 72 hours of the original delivery. If the inbound message occurs outside the 72 hour window, or we can't attribute the inbound message to a delivery, this field is \`null\`.

### Changelog[](#changelog)

### v5[](#v5)

This version of our schema is backwards compatible with v4. It simply adds fields to a few schemas.

**Campaigns, Broadcasts, and Campaign Action schemas**:

*   Added a `topic_names` field. If you use our subscription center feature, this lists the topics a person must be subscribed to to receive your campaign, broadcast, or message action.

**Metrics**

*   Added fields that help you determine whether a metric was performed by a person or by a machine. These fields typically apply to email metrics. For example, an email service may prefectch images or scan your message to ensure that it is safe. In these cases, you may see that `prefetched` is `true` for an email `opened` metric. The new fields are:
*   `proxied`
*   `prefetched`
*   `machine`
*   `user_agent`
*   `email_client`
*   `inbox_domain`
*   `inbox_provider`
*   `mx_host`

#### Upgrade to v5[](#upgrade-to-v5)

If you’re upgrading from a version before v4, you should perform upgrade steps from the previous versions as well. The steps below represent an upgrade from the v4 schema.

1.  In your Customer.io workspace, go to **Data & Integrations** > **Integrations**, and click your data warehouse integration.
2.  Click **Upgrade Sync**. We’ll automatically disable the affected schemas so you can update your database.
3.  In your database, add a string-type `topic_names` column to the Campaigns, Broadcasts, and Campaign Action tables.
4.  In the Metrics table, add columns to handle the new fields: `proxied`, `prefetched`, `machine`, `user_agent`, `email_client`, `inbox_domain`, `inbox_provider`, and `mx_host`.
5.  Query the Campaigns, Broadcasts, and Campaign Action tables to populate the new `topic_names` column.
6.  Query the Metrics table to populate the new fields.
7.  Return to Customer.io and re-enable the Campaigns, Broadcasts, Campaign Action, and Metrics schemas.

### v4[](#v4)

If you’re on a prior schema, [your Objects and Subjects tables](#upgrade-to-v4) were disabled on **October 31, 2022**. You’ll need to upgrade to continue receiving subjects and outputs.

Version 4 includes the following changes from v3:

*   The *Subjects* and *Outputs* tables have been disabled for previous versions. You must update to v4 to use Subjects and Outputs.
*   Added a *Delivery Content* table, similar to the *Deliveries* table but this new table includes the actual message content for each delivery. You can associate content with a delivery by `delivery_id`.
*   The *Outputs* schema now includes a new `subject_name` column. This is the same column from the *Subjects* table.
*   Removed `subject_id` from the *Outputs* table.

#### Upgrade to v4[](#upgrade-to-v4)

1.  In your Customer.io workspace, go to **Data & Integrations** > **Integrations**, and click your data warehouse integration.
2.  Click **Upgrade Sync**. We’ll automatically disable the Subjects and Outputs schemas, so you can update your database.
3.  In your database, add a string-type `subject_name` column to the Outputs table.
4.  Query the Subjects table to populate the new `outputs.subject_name` column.
5.  Drop the `subject_id` column from the Subjects and Outputs tables in your database.
6.  (Optional) Create a Delivery Content table based on the new [`delivery_content` schema](#delivery-content-schema)
7.  Return to Customer.io and re-enable the Subjects, Outputs, and/or Delivery Content schemas.

### v3[](#v3)

Version 3 includes the following changes from v2:

*   A `seq_num` column in the Deliveries, Subjects, Outputs, and Metrics tables. This is a constantly increasing value, where a larger value indicates a more current record.
*   People tables now contain record updates, rather than only the record as first created. For example, we produce a new record if you change a person’s email, delete them, or suppress them.
*   The People table now has an `updated_at` column. Because many data warehouses don’t replace rows when adding duplicate primary keys, you can select the most recent `updated_at` value for each profile.
*   The People table now contains an `email_addr` column.

### v2[](#v2)

Data warehouse sync v2 includes the following changes from v1:

*   Support for transactional messages (as `transactional_message_id`) in the Deliveries schema.
*   A fix for an issue that caused missing rows in Subjects and Outputs data. As a result of this bug, data warehouse v1 no longer supports Subjects or Outputs data.

If you used our initial data warehouse release, we recommend updating to the v2 implementation. However, you can continue using our original data warehouse sync feature if you don’t use:

*   Subjects and/or Outputs data
*   Transactional messages

## Frequently asked questions[](#faq)

### How are exported parquet files organized?[](#how-are-exported-parquet-files-organized)

Each parquet file is named `<name>_v<x>_<workspace_id>_<sequence>`.

*   `<name>` is either deliveries, metrics, subjects, outputs, or people.
*   `v<x>` indicates the schema version. The current version is v4; v1 schemas do not have a version indicator and are deprecated.
*   `<workspace_id>` refers to the Customer.io [workspace](/journeys/workspaces) whose data is included.
*   `<sequence>` is an ever-increasing value over time.

 Initial metrics sync file names

Your initial metrics sync is broken up into files that indicate a starting and ending sequence, to help you order things appropriately, for example: `<name>_v<x>_<workspace_id>_<sequence_pt1>_<sequence_pt2>`

### How do you handle export failures?[](#how-do-you-handle-export-failures)

If we experience an internal failure, we monitor and repair it promptly. If there’s an external failure such as a networking problem, we retry the export; the next successful export will contain all data since the last successful export. You won’t lose data.

If there’s a failure preventing us from connecting to your storage bucket, we’ll reach out to you with details. When you fix the issue, the next export will contain all data since the last successful export. If your export continues to fail after we’ve sent follow-ups, we’ll delete it automatically so that it doesn’t consume unnecessary resources.

### How should I import data from my bucket to my data warehouse?[](#how-should-i-import-data-from-my-bucket-to-my-data-warehouse)

There are many approaches to this, but here’s one example we’ve seen work for moving data from a GCS bucket to BigQuery.

*   Implement a Cloud Function to automatically import the parquet files from your GCS bucket to a BigQuery table.
*   Set an expiration on the parquet files so they get automatically deleted.

Below is a screenshot of an example Cloud Function. Download sample code for this cloud function—[based on our v5](#schemas) schema—by clicking the link below:

[main.go](/files/main.go)

[![BigQuery Cloud Function](https://docs.customer.io/images/BigQuery%20Cloud%20Function.png)](#77ff098197f9bee9a356c3a969557a61-lightbox)

### How can I get information about a campaign, workflow action, or message?[](#how-can-i-get-information-about-a-campaign-workflow-action-or-message)

We now have the *Campaign Metadata* schemas, which return campaign and actionA block in a campaign workflow—like a message, delay, or attribute change. names. Beyond that, you can [our App API](/integrations/api/app/) to pull any extra information you need.

For example, calling `https://api.customer.io/v1/campaigns/:id` with the id of the campaign from a subject record will give you details about the campaign.

### How do I get all of the attributes for a profile?[](#how-do-i-get-all-of-the-attributes-for-a-profile)

On setup, the initial Attributes export will include a list of profiles and their current attributes. Subsequent files will only contain *attribute changes*, with one change per row. In order to get the most recent attributes for a particular profile, you’ll need to use the timestamp value to query for the latest.

An example query for Snowflake would be:

```sql
select internal_customer_id, attribute_name, attribute_value 
from (select internal_customer_id, attribute_name, attribute_value, row_number() over (partition by internal_customer_id,attribute_name order by timestamp desc) RNO from attributes) 
where rno=1 and internal_customer_id='xxxxxxxxxxx';
```