You might not have access to this feature!

This feature is only available on our premium and enterprise plans. [Talk to our team](mailto:premium@customer.io) about upgrading your plan.

# Amazon S3

[PremiumThis feature is available for Premium plans.](/accounts-and-workspaces/plan-features/) [EnterpriseThis feature is available for Enterprise plans.](/accounts-and-workspaces/plan-features/)

Send Customer.io data about messages, people, metrics, etc to Amazon S3 storage. From here, you can ingest your data into the data warehouse of your choosing. This integration syncs up to every 15 minutes, helping you keep up to date on your audience’s message activities.

 We have two integrations!

This integration uses Customer.io as a source, and syncs data from your Customer.io workspace to your data warehouse, including campaign information. Our other integration sends data from multiple sources to your data warehouse. While the other integration captures data from multiple sources, even if those sources don’t send data to your workspace, it *cannot* capture some data from within Customer.io like campaign information.

## How it works[](#how-it-works)

This integration exports individual [parquet](https://parquet.apache.org/) files for **Deliveries**, **Metrics**, **Subjects**, **Outputs**, **Content**, **People**, and **Attributes** to your storage bucket. Each parquet file contains data that changed since the last export.

Once the parquet files are in your storage bucket, you can import them into data platforms like Fivetran or data warehouses like Redshift, BigQuery, and Snowflake.

Note that this integration only publishes parquet files to your storage bucket. You must set your data warehouse to ingest this data. There are many approaches to ingesting data, but it typically requires a COPY command to load the parquet files from your bucket. After you load parquet files, you should set them to expire to delete them automatically.

We attempt to export parquet files every 15 minutes, though actual sync intervals and processing times may vary. When syncing large data sets, or Customer.io experiences a high volume of concurrent sync operations, it can take up to several hours to process and export data. This feature is not intended to sync data in real time.

 Your initial sync includes historical data

During the first sync, you’ll receive a history of your Deliveries, Metrics, Subjects, and Outputs data. However, People who have been deleted or suppressed before the first sync are **not** included in the People file export and the historical data in the other export files is anonymized for the deleted and suppressed People.

### The initial export vs incremental exports[](#the-initial-export-vs-incremental-exports)

Your initial sync is a set of files containing historical data to represent your workspace’s current state. Subsequent sync files contain changesets.

*   **Metrics**: The initial metrics sync is broken up into files with two sequence numbers, as follows. `<name>_v5_<workspace_id>_<sequence1>_<sequence2>`.
*   **Attributes**: The initial Attributes sync includes a list of profiles and their current attributes. Subsequent files will only contain *attribute changes*, with one change per row.
*   **Events**: The initial events sync includes up to 30 days of past events. Subsequent files contain events since the previous sync interval. We cannot export events older than 30 days.

For example, let’s say you’ve enabled the [Attributes export](#schemas). We will attempt to sync your data to your storage bucket every 15 minutes:

1.  12:00pm We sync your Attributes Schema for the first time. This includes a list of profiles and their current attributes.
2.  12:05pm User1’s email is updated to [company-email@example.com](mailto:company-email@example.com).
3.  12:10pm User1’s email is updated to [personal-email@example.com](mailto:personal-email@example.com).
4.  12:15 We sync your data again. In this export, you would only see attribute changes, with one change per row. User1 would have one row dedicated to his email changing.

### Requirements[](#requirements)

If you use a firewall or an allowlist, you must allow the following IP addresses to support traffic from Customer.io. Make sure you use the correct IP addresses for your account region.

Data Warehouse IP Addresses (data-out)

US Region

EU Region

34.71.192.245

34.118.255.179

35.188.196.183

34.76.143.229

104.198.177.219

34.78.91.47

35.184.88.76

35.187.55.80

34.72.101.57

104.199.99.65

34.123.199.33

34.76.81.2

35.222.137.61

34.77.146.181

34.68.113.63

34.140.234.108

35.240.84.170

35.195.54.15

34.38.105.52

104.155.66.230

34.76.119.61

34.140.67.73

34.78.74.81

 Do you use other Customer.io features?

These IP addresses are specific to outgoing Data Warehouse integrations. If you use your own SMTP server or receive webhooks, you may also need to allow additional addresses. See our [complete IP allowlist](/accounts-and-workspaces/ip-addresses).

## Set up an Amazon S3 data-out integration[](#s3-bucket)

Before you begin, make sure that you’re prepared to ingest relevant parquet files from Customer.io. For S3, you’ll need to set up your bucket with `ListBucketVersions`, `ListBucket`, `GetObject`, and `PutObject` before you can sync data from Customer.io.

1.  Create an Access Key and a Service Key with read/write permissions to your S3 or Yandex bucket.
2.  Go to **Integrations** and select Amazon S3 and then click **Sync Bucket**.
3.  Enter information about your bucket and click **Select data**.
    *   Enter the *Name* of your bucket.
    *   Enter the *path* to your bucket.
    *   Paste your *Access* and *Secret* keys in the appropriate fields.
    *   Select the *Region* your bucket is in.

[![dws-s3-setup.png](https://docs.customer.io/images/dws-s3-setup.png)](#4e871607235c56c98ea975c80371dfd7-lightbox)

4.  Select the data types that you want to export from Customer.io to your bucket. By default, we export all data types, but you can disable the types that you aren’t interested in.
5.  Click **Create and sync data**.

### Each sync includes a `cio-validate` file[](#each-sync-includes-a-cio-validate-file)

If you sync data to an Amazon S3 bucket, Customer.io writes a file called `cio-validate` to your bucket before every sync. This is an empty file that we use to verify that we have write permissions to your bucket before each sync.

**You can safely delete this file**. It does not affect data sync operations, and it’s not part of your exported data.

If you have automated processes that import parquet files from your bucket, you may want to configure them to ignore the `cio-validate` file, since it’s not a parquet file and doesn’t contain any data.

## Pausing and resuming your sync[](#pause-and-resume)

You can turn off files you no longer want to receive, or pause them momentarily as you update your integration, and turn them back on. When you turn a file schema on, we send files to catch you up from the last export.If you haven’t exported a particular file before—the file was never “on”—the initial sync contains your historical data.

You can also disable your entire sync, in which case we’ll quit sending files all together. When you enable your sync again, we send all of your historical data as if you’re starting a new integration. Before you disable a sync, consider if you simply want to [disable individual files](#turn-data-types-on-or-off) and resume them later.

 Delete old sync files before you re-enable a sync

Before you resume a sync that you previously disabled, you should clear any old files from your storage bucket so that there’s no confusion between your old files and the files we send with the re-enabled sync.

### Disabling and enabling individual export files[](#turn-data-types-on-or-off)

1.  Go to *Data & Integrations* > *Integrations* and select **Amazon S3**.

2.  Select the files you want to turn on or off.

When you enable a *file*, the next sync will contain baseline historical data catching up from your previous sync or the complete history if you haven’t synced a file before; subsequent syncs will contain changesets.

 Turning the People file off

If you turn the People file off for more than 7 days, you will not be able to re-enable it. You’ll need to delete your sync configuration, purge all sync files from your destination storage bucket, and [create a new sync](#set-up-a-database-integration) to resume syncing people data.

[![Turn files on or off](https://docs.customer.io/images/dws-turn-on-sync.png)](#120b5fa46817ef6ae62b41162684f839-lightbox)

### Disabling your sync[](#disabling-your-sync)

If your sync is already disabled, you can enable it again with these instructions. But, before you re-enable your sync, you should clear the previous sync files from your data warehouse bucket first. See [Pausing and resuming your sync](#pause-and-resume) for more information.

1.  Go to *Data & Integrations* > *Integrations* and select **Amazon S3**.
2.  Click **Disable Sync**.

## Manage your configuration[](#manage-your-configuration)

You can change settings for a bucket, if your path changes or you need to swap keys for security purposes.

1.  Go to *Data & Integrations* > *Integrations* and select **Amazon S3**.
2.  Click **Manage Configuration** for your bucket.
3.  Make your changes. No matter your changes, you must input your *Service Account Key* (GCS) or *Secret Key* (S3, Yandex) again.
4.  Click **Update Configuration**. Subsequent syncs will use your new configuration.

[![Edit your data warehouse bucket configuration](https://docs.customer.io/images/dws-edit.png)](#77792451b12b948b3bba853ae4ee3e9c-lightbox)

### Update sync schema version[](#update-sync-schema-version)

Before you prepare to update your data warehouse sync version, see the [changelog](/integrations/data-out/data-warehouses/data-warehouses-intro/#changelog). [You’ll need to update schemas](/integrations/data-out/data-warehouses/data-warehouses-intro/#upgrade-to-v5) to upgrade to the latest version (v5).

 When updating from v1 to a later version, you must:

*   Update ingestion logic to accept the new file name format: `<name>_v<x>_<workspace_id>_<sequence>.parquet`
*   Delete existing rows in your Subjects and Outputs tables. When you update, we send all of your Subjects and Outputs data from the beginning of your history using the new file schema.

[![dws-upgrade-version.png](https://docs.customer.io/images/dws-upgrade-version.png)](#dc5f7e9a0ad99fa9134c363c41c4e350-lightbox)

1.  Go to *Data & Integrations* > *Integrations* and select **Amazon S3**.
2.  Click **Upgrade Schema Version**.
3.  Follow the instructions to make sure that your ingestion logic is updated accordingly.
4.  Confirm that you’ve made the appropriate pages and click **Upgrade sync**. The next sync uses the updated schema version.

## Parquet file schemas[](#schemas)

This section describes the different kinds of files you can export from our Database-out integrations. Many schemas include an `internal_customer_id`—this is the `[cio_idAn identifier for a person that is automatically generated by Customer.io and cannot be changed. This identifier provides a complete, unbroken record of a person across changes to their other identifiers (id, email, etc).](/identifying-people/#cio_id)`. You can use it to resolve a person associated with a subject, delivery, etc.

These schemas represent the latest versions available. Check out our [changelog](/integrations/data-out/data-warehouses/data-warehouses-intro/#changelog) for information about earlier versions.

#### Deliveries[](#Deliveries)

Deliveries are individual email, in-app, push, SMS, slack, and webhook records sent from your workspace. The first deliveries export file includes baseline historical data. Subsequent files contain rows for data that changed since the last export.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the delivery record.

delivery\_id

✅

STRING (Required). The ID of the delivery record.

internal\_customer\_id

People

STRING (Nullable). The `cio_id` of the person in question. Use the `people` parquet file to resolve this ID to an external `customer_id` or email address.

subject\_id

Subjects

STRING (Nullable). If the delivery was created as part of a Campaign or API Triggered Broadcast workflow, this is the ID for the path the person went through in the workflow. **Note: This value refers to, and is the same as, the `subject_name` in the subjects table.**

event\_id

Subjects

STRING (Nullable). If the delivery was created as part of an event-triggered Campaign, this is the ID for the unique event that triggered the workflow. Note that this is a foreign key for the subjects table, and not the metrics table.

delivery\_type

STRING (Required). The type of delivery: email, push, in-app, sms, slack, or webhook.

campaign\_id

INTEGER (Nullable). If the delivery was created as part of a Campaign or API Triggered Broadcast workflow, this is the ID for the Campaign or API Triggered Broadcast.

action\_id

INTEGER (Nullable). If the delivery was created as part of a Campaign or API Triggered Broadcast workflow, this is the ID for the unique workflow item that caused the delivery to be created.

newsletter\_id

INTEGER (Nullable). If the delivery was created as part of a Newsletter, this is the unique ID of that Newsletter.

content\_id

INTEGER (Nullable). If the delivery was created as part of a Newsletter split test, this is the unique ID of the Newsletter variant.

trigger\_id

INTEGER (Nullable). If the delivery was created as part of an API Triggered Broadcast, this is the unique trigger ID associated with the API call that triggered the broadcast.

created\_at

TIMESTAMP (Required). The timestamp the delivery was created at.

transactional\_message\_id

INTEGER (Nullable). If the delivery occurred as a part of a transactional message, this is the unique identifier for the API call that triggered the message.

seq\_num

INTEGER (Required) A monotonically increasing number indicating relative recency for each record: the larger the number, the more recent the record.

#### Delivery Content[](#Delivery Content)

The `delivery_content` schema represents message contents; each row corresponds to an individual delivery. Use the `delivery_id` to find more information about the contents of a message, or the `recipient` to find information about the person who received the message.

If your delivery was produced from a campaign, it’ll include campaign and action IDs, and the newsletter and content IDs will be null. If your delivery came from a newsletter, the row will include newsletter and content IDs, and the campaign and action IDs will be null.

**Delivery content might lag behind other tables by 15-30 minutes (or roughly 1 sync operation).** We package delivery contents on a 15 minute interval, and can export to your data warehouse up to every 15 minutes. If these operations don’t line up, we might occasionally export `delivery_content` after other tables.

 Delivery content can be a very large data set

Workspaces that have sent many messages may have hundreds or thousands of GB of data.

 Delivery content is available in v4 or later

The `delivery_content` schema was introduced in our v4 release. You need to [update your data warehouse schemas](/integrations/data-out/data-warehouses/data-warehouses-intro/#upgrade-to-v4) or later to take advantage of the update and see Delivery Content, Subjects, and Outputs.

Field Name

Primary Key

Foreign Key

Description

delivery\_id

✅

Deliveries

STRING (Required). The ID of that delivery associated with the message content.

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the output record.

type

STRING (Required). The delivery type—one of email, sms, push, in-app, or webhook.

campaign\_id

INTEGER (Nullable). The ID for the campaign that produced the content (if applicable).

action\_id

INTEGER (Nullable). The ID for the campaign workflow item that produced the content.

newsletter\_id

INTEGER (Nullable). The ID for the newsletter that produced the content.

content\_id

INTEGER (Nullable). The ID for the newsletter content, 0 indexed. If your newsletter did not include an A/B test or multiple languages, this value is 0.

from

STRING (Nullable). The from address for an email, if the content represents an email.

reply\_to

STRING (Nullable). The Reply To address for an email, if the content is related to an email.

bcc

STRING (Nullable). The Blind Carbon Copy (BCC) address for an email, if the content is related to an email.

recipient

STRING (Required). The person who received the message, dependent on the type. For an email, this is an email address; for an SMS, it's a phone number; for a push notification, it's a device ID.

subject

STRING (Nullable). The subject line of the message, if applicable; required if the message is an email

body

STRING (Required). The body of the message, including all HTML markup for an email.

body\_amp

STRING (Nullable). The HTML body of an email including any AMP-enabled JavaScript included in the message.

body\_plain

STRING (nullable). The plain text of an email message, without HTML tags or AMP content. This field is typically `null` *unless* you manually set or change the plain-text version of an email (the `body_plain` field when you use our APIs).

preheader

STRING (Nullable). "Also known as "preview text", this is the block block of text that users see next to, or underneath, the subject line in their inbox.

url

STRING (Nullable). If the delivery is an outgoing webhook, this is the URL of the webhook.

method

STRING (Nullable). If the delivery is an outgoing webhook, this is the HTTP method used—POST, PUT, GET, etc.

headers

STRING (Nullable). If the delivery is an outgoing webhook, these are the headers included with the webhook.

#### Metrics[](#Metrics)

Metrics exports detail events relating to deliveries (e.g. messages sent, opened, etc). Your initial metrics export contains baseline historical data, broken up into files with two sequence numbers, as follows: `<name>_v5_<workspace_id>_<sequence1>_sequence2>`.

Subsequent files contain rows for data that changed since the last export.

 You might have multiple entries per delivery\_id

For example, person can click a link in a message multiple times, creating multiple “clicked” metrics. We might attempt a message delivery multiple times before it’s successfully sent, creating multiple “attempted” metrics. Depending on the metrics you care about, you might need to deduplicate or aggregate metrics based on the `delivery_id` to get correct counts.

Field Name

Primary Key

Foreign Key

Description

event\_id

✅

STRING (Required). The unique ID of the metric event. This can be useful for deduplicating purposes.

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the metric record.

delivery\_id

Deliveries

STRING (Required). The ID of the delivery record.

metric

STRING (Required). The type of metric (e.g. sent, delivered, opened, clicked).

reason

STRING (Nullable). For certain metrics (e.g. attempted), the reason behind the action.

link\_id

INTEGER (Nullable). For "clicked" metrics, the unique ID of the link being clicked.

link\_url

STRING (Nullable). For "clicked" metrics, the URL of the clicked link. (Truncated to 1000 bytes.)

created\_at

TIMESTAMP (Required). The timestamp the metric was created at.

seq\_num

INTEGER (Required) A monotonically increasing number indicating relative recency for each record: the larger the number, the more recent the record.

proxied

Boolean. For email `opened` metrics, this indicates that the open event originated from a proxy server. For example, a proxy server may record an open independently of a message reaching the user’s inbox. For other metrics, this is `false`.

prefetched

Boolean. For email `opened` metrics, this indicates that the metric was the result of prefetching and not necessarily a user action. For example, Gmail prefetches images to speed up rendering in the inbox, which may result in an opened metric—but the user didn’t actually open the email. For other metrics, this this value is false.

machine

Boolean. For email `clicked` metrics, it means that the click event originated a non-human, e.g. a security service or email-protection application clicked a link. For other metrics, this is `false`.

user\_agent

STRING (Nullable). The user agent string of the person (or machine) who performed the action, where available. If we don't have a user agent string, this value is null.

email\_client

STRING (Nullable). For email metrics, the email client related to the action; applies to metrics like opened, clicked, etc. For non email channels, this value is null.

inbox\_domain

STRING (Nullable). For email metrics, the inbox domain of the person who performed the action. If this value isn't discernable, or the metric is not email related, this value is null.

inbox\_provider

STRING (Nullable). For email metrics, the inbox provider of the person who performed the action. If this value isn't discernable, or the metric is not email related, this value is null.

mx\_host

STRING (Nullable). For email metrics, this is the MX host of the inbox (e.g. mailhost1.example.com). If this value isn't discernable, or the metric is not email related, this value is null.

#### Outputs[](#Outputs)

Outputs are the unique steps within each workflow journey. The first outputs file includes historical data. Subsequent files contain rows for data that changed since the last export.

 Upgrade to v4 to use subjects and outputs

We’ve made some minor changes to subjects and outputs a part of our v4 release. If you’re using a previous schema version, we disabled your subjects and outputs on October 31st, 2022. You need to [upgrade](/integrations/data-out/data-warehouses/data-warehouses-intro/#upgrade-to-v4) to schema version 4 or later, to continue syncing outputs and subjects data.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the output record.

output\_id

✅

STRING (Required). The ID for the step of the unique path a person went through in a Campaign or API Triggered Broadcast workflow.

subject\_name

Subjects

STRING (Required). A secondary unique ID for the path a person took through a campaign or broadcast workflow.

output\_type

STRING (Required). The type of step a person went through in a Campaign or API Triggered Broadcast workflow. Note that the “delay” output\_type covers many use cases: a Time Delay or Time Window workflow item, a “[grace period](/journeys/grace-periods/)”, or a date-based campaign trigger.

action\_id

INTEGER (Required). The ID for the unique workflow item associated with the output.

explanation

STRING (Required). The explanation for the output.

delivery\_id

Deliveries

STRING (Nullable). If a delivery resulted from this step of the workflow, this is the ID of that delivery.

draft

BOOLEAN (Nullable). If a delivery resulted from this step of the workflow, this indicates whether the delivery was created as a draft.

link\_tracked

BOOLEAN (Nullable). If a delivery resulted from this step of the workflow, this indicates whether links within the delivery are configured for tracking.

split\_test\_index

INTEGER (Nullable). If the step of the workflow was a Split Test, this indicates the variant of the Split Test.

delay\_ends\_at

TIMESTAMP (Nullable). If the step of the workflow involves a delay, this is the timestamp for when the delay will end.

branch\_index

INTEGER (Nullable). If the step of the workflow was a T/F Branch, a Multi-Split Branch, or a Random Cohort Branch, this indicates the branch that was followed.

manual\_segment\_id

INTEGER (Nullable). If the step of the workflow was a Manual Segment Update, this is the ID of the Manual Segment involved.

add\_to\_manual\_segment

BOOLEAN (Nullable). If the step of the workflow was a Manual Segment Update, this indicates whether a person was added or removed from the Manual Segment involved.

created\_at

TIMESTAMP (Required). The timestamp the output was created at.

seq\_num

INTEGER (Required) A monotonically increasing number indicating relative recency for each record: the larger the number, the more recent the record.

#### People[](#People)

The first People export file includes a list of current people at the time of your first sync (deleted or suppressed people are not included in the first file). Subsequent exports include people who were created, deleted, or suppressed since the last export.

People exports come in two different files:

*   `people_v5_<env>_<seq>.parquet`: Contains new people.
*   `people_v5_chngs_<env>_<seq>.parquet`: Contains changes to people since the previous sync.

These files have an identical structure and a part of the same data set. You should import them to the same table.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the person.

customer\_id

STRING (Required). The ID of the person in question. This will match the ID you see in the Customer.io UI.

internal\_customer\_id

✅

STRING (Required). The `cio_id` of the person in question. Use the `people` parquet file to resolve this ID to an external `customer_id` or email address.

deleted

BOOLEAN (Nullable). This indicates whether the person has been deleted.

suppressed

BOOLEAN (Nullable). This indicates whether the person has been suppressed.

created\_at

TIMESTAMP (Required). The date/time when the person was added to Customer.io (using the `_created_in_customerio_at` attribute). Note that this is [not necessarily the same as a person's `created_at` value](/journeys/segmentation-and-timestamp-rules/#dual-created-at)! If you import people from an external system, a CSV, or backdate the `created_at` value, this value is likely to be different from a person's `created_at` attribute.

Note that this value is 0 for deleted or suppressed people

updated\_at

TIMESTAMP (Required) The date-time when a person was updated. Use the most recent updated\_at value for a customer\_id to disambiguate between multiple records.

email\_addr

STRING (Optional) The email address of the person. For workspaces using email as a unique identifier, this value may be the same as the `customer_id`.

#### Subjects[](#Subjects)

Subjects are the unique workflow journeys that people take through Campaigns and API Triggered Broadcasts. The first subjects export file includes baseline historical data. Subsequent files contain rows for data that changed since the last export.

 Upgrade to v4 to use subjects and outputs

We’ve made some minor changes to subjects and outputs a part of our v4 release. If you’re using a previous schema version, we disabled your subjects and outputs on October 31st, 2022. You need to [upgrade](/integrations/data-out/data-warehouses/data-warehouses-intro/#upgrade-to-v4) to schema version 4 or later, to continue syncing outputs and subjects data.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the subject record.

subject\_name

✅

STRING (Required). A unique ID for the path a person took through a campaign or broadcast workflow.

internal\_customer\_id

People

STRING (Nullable). The `cio_id` of the person in question. Use the `people` parquet file to resolve this ID to an external `customer_id` or email address.

campaign\_type

STRING (Required). The type of Campaign (segment, event, or triggered\_broadcast)

campaign\_id

INTEGER (Required). The ID of the Campaign or API Triggered Broadcast.

event\_id

Metrics

STRING (Nullable). The ID for the unique event that triggered the workflow.

trigger\_id

INTEGER (Optional). If the delivery was created as part of an API Triggered Broadcast, this is the unique trigger ID associated with the API call that triggered the broadcast.

started\_campaign\_at

TIMESTAMP (Required). The timestamp when the person first matched the campaign trigger. For event-triggered campaigns, this is the timestamp of the trigger event. For segment-triggered campaigns, this is the time the user entered the segment.

created\_at

TIMESTAMP (Required). The timestamp the subject was created at.

seq\_num

INTEGER (Required) A monotonically increasing number indicating relative recency for each record: the larger the number, the more recent the record.

#### Attributes[](#Attributes)

Attribute exports represent changes to people (by way of their attribute values) over time. The initial Attributes export includes a list of profiles and their current attributes. Subsequent files contain *attribute changes*, with one change per row.

For changes to nested attributes, like the [subscription preferences attribute](/journeys/manage-subscription-preferences/#backfill-preferences), the `attribute_name` will be the top-level attribute and the `attribute_value` returns the stringified JSON representing the nested changes. Using our subscription preferences example, the `attribute_name` would be `cio_subscription_preferences` and the `attribute_value` would be something like `"{\"topics\":{\"topic_7\":false,\"topic_8\":false}}"`.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the person.

internal\_customer\_id

✅

STRING (Required). The `cio_id` of the person in question. Use the `people` parquet file to resolve this ID to an external `customer_id` or email address.

attribute\_name

STRING (Required). The attribute that was updated.

attribute\_value

STRING (Required). The new value of the attribute.

timestamp

TIMESTAMP (Required). The timestamp of the attribute update.

#### Campaigns[](#Campaigns)

When you enable the **Campaign Metadata** schema, we actually return two different tables: *Campaigns* and *Actions*. The *Campaigns* table returns the names and versions of your campaigns and API-triggered broadcasts. Some other tables—like *Deliveries* and *Subjects*—return campaign *ID* values. You can use this table to get campaign names based on those IDs so you can better understand exports related to campaigns.

Note that this table includes both **Campaigns** and **API-triggered broadcasts**; both have `campaign_id` values. **Newsletters** appear in the Broadcasts table with a `broadcast_id`.

With each sync, we’ll return the rows where the `version` changed. The `version` is a number that increments each time a campaign or API-triggered broadcast is updated. This way, you can keep your campaign names and versions up-to-date.

 Each row is an update

You’ll see a row for each update to each campaign or API-triggered broadcast. If joining to this table, you may want to include a condition so that you only get the MAX `updated_at` value for each `campaign_id` to get the most recent version.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace containing the campaign.

campaign\_id

✅

INTEGER (Required). The ID of the campaign or API-triggered broadcast. Note that newsletters appear in the Broadcasts schema with a \`broadcast\_id\`, not here.

name

STRING (Required). The name of a campaign. You set this in Customer.io when you create your campaign.

created\_at

TIMESTAMP (Required). The date-time (in milliseconds) when you created the campaign. You can create campaigns without activating them!

updated\_at

TIMESTAMP (Required) The date-time (in milliseconds) when a campaign was last updated.

version

INTEGER (Required) An incrementing number starting at 1 representing the “version” of the campaign. The largest version number represents the latest version of the campaign. Versions increment when you change the name, trigger, or goal of a campaign. See the *Actions* table for changes to messages and other items in your campaign workflow.

topic\_names

LIST (Nullable). If you use our subscription center feature, this value is a comma-separated list of subscription topics that users must be subscribed to in order to receive the campaign. If the campaign is not associated with any topics, this value is null.

#### Broadcasts[](#Broadcasts)

The **Broadcasts** schema returns information about your newsletters. Note that API-triggered broadcasts appear in the **Campaigns** schema, not the Broadcasts schema. The initial sync returns all your newsletters. Subsequent syncs return only the newsletters that have changed since the last sync.

 Each row is an update

You’ll see a row for each update to each broadcast. For example, if you edit the content, audience, and settings for a broadcast, you’ll see three rows. If joining to this table, you may want to include a condition so that you only get the MAX `updated_at` value for each `broadcast_id` to get the most recent version.

 Broadcasts vs Campaigns

In the data warehouse schemas:

*   **Newsletters** appear in the Broadcasts schema with a `broadcast_id`
*   **API-triggered broadcasts** appear in the Campaigns schema with a `campaign_id`

This is why newsletters and API-triggered broadcasts can share the same ID value—they exist in different schemas.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace containing the broadcast.

broadcast\_id

✅

INTEGER (Required). The ID of the newsletter. Note that API-triggered broadcasts appear in the Campaigns schema with a \`campaign\_id\`, not here.

name

STRING (Required). The name of a broadcast. You set this in Customer.io when you create your broadcast.

created\_at

TIMESTAMP (Required). The date-time (in milliseconds) when you created the broadcast. You can create broadcasts without activating them!

updated\_at

TIMESTAMP (Required) The date-time (in milliseconds) when a broadcast was last updated.

topic\_names

LIST (Nullable). If you use our subscription center feature, this value is a comma-separated list of subscription topics that users must be subscribed to in order to receive the campaign. If the campaign is not associated with any topics, this value is null.

#### Actions[](#Actions)

When you enable the **Campaign Metadata** schema, we actually return two different tables: *Campaigns* and *Actions*. The *Actions* table returns the names and versions of workflow steps in your campaigns, which we call actionsA block in a campaign workflow—like a message, delay, or attribute change.. Some other tables—like *Deliveries* and *Subjects*—return action *ID* values. You can use this table to get the names of actions in your campaigns, so it’s easier for you to understand your campaign and action-related data.

With each sync, we’ll return the rows where the `version` changed. The `version` is a number that increments each time a campaign is updated. This way, you can keep your understanding of campaign actions up-to-date.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace containing the workflow action.

campaign\_id

Campaigns

INTEGER (Required). The ID of the campaign containing the action.

action\_id

INTEGER (Required). The ID of the action.

name

STRING (Optional). The name of a workflow action. You set this in Customer.io when you create or edit your action. If you didn't set a name for the action, this field is empty.

created\_at

TIMESTAMP (Required). The date-time (in milliseconds) when you created the workflow action.

updated\_at

TIMESTAMP (Required) The date-time (in milliseconds) when a workflow action was last updated.

version

INTEGER (Required) An incrementing number starting at 1 representing the "version" of the workflow action. The largest number for any action represents the latest version. The version changes whenever you update the name, content, or settings of your workflow action.

topic\_names

LIST (Nullable). If you use our subscription center feature, this value is a comma-separated list of subscription topics that users must be subscribed to in order to receive the campaign. If the campaign is not associated with any topics, this value is null.

#### Objects[](#Objects)

The first Object export file includes a list of current objects at the time of your first sync (deleted objects are not included in the first file). Subsequent exports include objects who were created, deleted, or suppressed since the last export.

**When you enable the *Objects* export, we also export *Object Types*.**

object exports come in two different files:

*   `object_v5_<env>_<seq>.parquet`: Contains new objects.
*   `object_v5_chngs_<env>_<seq>.parquet`: Contains changes to objects since the previous sync.

These files have an identical structure and a part of the same data set. You should import them to the same table.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the object.

object\_type\_id

Object Types

INTEGER (Required). Object type IDs begin at 1 and increase sequentially. For example, if you created objects call Accounts and Companies, in that order, they’d have object types `1` and `2` respectively.

object\_id

STRING (Required). The ID of the object in question. This will match the ID you see in the Customer.io UI.

internal\_object\_id

✅

STRING (Required). A unique, immutable ID that Customer.io assigns to the object. Other exports use this value in to reference your object; you can use this export to resolve internal IDs to your object IDs.

deleted

BOOLEAN (Nullable). This indicates whether the object has been deleted.

created\_at

TIMESTAMP (Required). The date/time when the object was added to your workspace.

updated\_at

TIMESTAMP (Required) The date-time when a object was updated. Use the most recent updated\_at value for an object\_id to disambiguate between multiple records.

#### Object Types[](#Object Types)

**We export object types when you enable the *Objects* export.** All objects have a *type* indicating what kind of entity they are—like an *account* or *company*. The `object_type` value is an integer starting at 1. For example, if you create two types of objects in your system, accounts and companies, in that order, accounts have an `object_type` of `1` and companies have an `object_type` of `2`.

The first export includes a list of object types at the time of your first sync (we don’t include deleted types in the first file). Subsequent exports include types you created, updated, or deleted since the last sync.

object exports come in two different files:

*   `object_types_v5_<env>_<seq>.parquet`: Contains new object types.
*   `object_types_v5_chngs_<env>_<seq>.parquet`: Contains changes to object types since the previous sync.

These files have an identical structure and a part of the same data set. You should import them to the same table.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the object.

object\_type\_id

✅

INTEGER (Required). Object type IDs begin at 1 and increase sequentially. For example, if you created objects call Accounts and Companies, in that order, they’d have object types `1` and `2` respectively.

name

STRING (Required). The name of the object type, like "Accounts" or "Companies."

slug

STRING (Required). The value you use to reference objects of this type with [liquidA syntax that supports variables, letting you personalize messages for your audience. For example, if you want to reference a person’s first name, you might use the variable `{{customer.first_name}}`.](/using-liquid). For example, if your object type is *Accounts*, you’ll typically reference objects using `{{objects.accounts}}`.

deleted

BOOLEAN (Required). If true, the object type has been deleted.

enabled

BOOLEAN (Required). If true, the object type is enabled. You can’t use disabled object types in segments, messages, and so on. [Learn more](/journeys/object-types/#enabledisable-an-object-type)

updated\_at

TIMESTAMP (Required). The date and time the object type was last updated.

#### Object Attributes[](#Object Attributes)

Object attribute exports contain changes to object [attributeA key-value pair that you associate with a person or an object—like a person’s name, the date they were created in your workspace, or a company’s billing date etc. Use attributes to target people and personalize messages.](/journeys/attributes/). The initial export includes a list of your current objects and their attributes. Subsequent files contain changes to object attributes, with one change per row.

If your object attributes contain nested JSON, the `attribute_name` is the top-level attribute and the `attribute_value` returns the stringified JSON for that attribute.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the person.

object\_type\_id

Object Types

INTEGER (Required). The type of the object represented by the `internal_object_id`. Object type IDs begin at 1 and increase sequentially. For example, if you created objects call Accounts and Companies, in that order, they’d have object types `1` and `2` respectively.

internal\_object\_id

✅

Objects

STRING (Required). A unique, immutable ID that Customer.io assigns to the object. You can resolve this value to the object name or ID you’re familiar with from the associated *Objects* export.

attribute\_name

STRING (Required). The attribute that changed.

attribute\_value

STRING (Required). The new value of the attribute.

timestamp

TIMESTAMP (Required). The timestamp of the attribute update.

#### Events[](#Events)

Events are the things people do in your app, on your website, etc. The Events export includes a list of events that people have triggered, with one event per row. Each event includes an `internal_customer_id` that you can use in conjunction with the People table to resolve a person’s `customer_id` or email address.

**The initial sync includes up to 30-days of past events.** Subsequent files contain events since the previous sync interval. We cannot backfill events older than 30 days.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the person.

event\_id

✅

STRING (Required). The ID of the event, which may be useful if you need to dedupe events.

internal\_customer\_id

People

STRING (Required). The `cio_id` of the person who performed the event. Use the `people` parquet file to resolve this ID to an external `customer_id` or email address.

name

STRING (Required). The event name.

type

STRING (Required). One of `event`, `page`, or `screen`; `page` and `screen` represent page and screenviews respectively. The `event` value represents any other kind of event.

data

STRING (Required). A stringified object containing the event `properties`—the event payload aside from the name, timestamps, and ID.

timestamp

TIMESTAMP (Required). The Unix timestamp associated with the event. If you don't set this value yourself, this is the date-time when Customer.io received the event.

processed\_at

TIMESTAMP (Required). The Unix time when Customer.io processed the event.

sources

ARRAY of STRINGS (Required). The source(s) of the event, e.g. `Customer.io Data Pipelines via JavaScript`.

source\_uas

ARRAY of STRINGS (Required). The user agent source(s) of the event, e.g. `Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:132.0) Gecko/20100101 Firefox/132.0`.

#### Inbound[](#Inbound)

You’ll only see the option to enable this schema if you send SMS through Customer.io.

When someone replies to an SMS message you sent, we record an `inbound` event. The “inbound” export contains one row for each inbound SMS message you receive between syncs. Each event includes an `internal_customer_id` that you can use in conjunction with the People table to resolve a person’s `customer_id` or email address.

**The initial sync includes up to 30-days of past inbound events.** Subsequent files contain events since the previous sync interval. We cannot backfill events older than 30 days.

Field Name

Primary Key

Foreign Key

Description

workspace\_id

INTEGER (Required). The ID of the Customer.io workspace associated with the inbound message.

event\_id

✅

STRING (Required). The unique event identifier, which may be useful if you need to dedupe events.

internal\_customer\_id

People

STRING (Required). The `cio_id` of the person who sent the message. Use the `people` parquet file to resolve this ID to an external `customer_id` or email address.

timestamp

TIMESTAMP (Required). The Unix timestamp when the person sent the inbound message.

processed\_at

TIMESTAMP (Required). The Unix timestamp when Customer.io processed the event.

channel

STRING (Required). The messaging channel (e.g., "sms").

from

STRING (Required). The phone number the person sent the inbound message from.

to

STRING (Required). The phone number the person replied to.

body

STRING (Required). The content of the inbound message.

keyword

STRING (Required). The keyword detected in the message, if any.

optout

BOOLEAN (Required). If true, the message was an opt-out request; if false, it was not.

messaging\_service\_sid

STRING (Required). The messaging service identifier from the SMS provider.

message\_sid

STRING (Required). The unique message identifier from the SMS provider.

in\_reply\_to\_delivery\_id

Deliveries

STRING (Required). The delivery ID of the message this inbound message is replying to, if available. We match inbound messages to deliveries within 72 hours of the original delivery. If the inbound message occurs outside the 72 hour window, or we can't attribute the inbound message to a delivery, this field is \`null\`.