Azure blob storage

PremiumThis feature is available on our Premium and Enterprise plans. Updated

Send Customer.io data about messages, people, metrics, etc to Microsoft Azure Blob Storage. From here, you can ingest your data into the data warehouse of your choosing. This integration exports files up to every 15 minutes, helping you keep up to date on your audience’s message activities.

How it works

This integration exports individual parquet files for Deliveries, Metrics, Subjects, Outputs, Content, People, and Attributes to your storage bucket. Each parquet file contains data that changed since the last export.

Once the parquet files are in your storage bucket, you can import them into data platforms like Fivetran or data warehouses like Redshift, BigQuery, and Snowflake.

Note that this integration only publishes parquet files to your storage bucket. You must set your data warehouse to ingest this data. There are many approaches to ingesting data, but it typically requires a COPY command to load the parquet files from your bucket. After you load parquet files, you should set them to expire to delete them automatically.

We attempt to export parquet files every 15 minutes, though actual sync intervals and processing times may vary. When syncing large data sets, or Customer.io experiences a high volume of concurrent sync operations, it can take up to several hours to process and export data. This feature is not intended to sync data in real time.

sequenceDiagram participant a as Customer.io participant b as Azure Blob Storage participant c as Data Warehouse loop up to every 15 minutes a->>b: export parquet files b->>c: ingest c->>b: expire/delete files
before next sync end

 Your initial sync includes historical data

During the first sync, you’ll receive a history of your Deliveries, Metrics, Subjects, and Outputs data. However, People who have been deleted or suppressed before the first sync are not included in the People file export and the historical data in the other export files is anonymized for the deleted and suppressed People.

The initial export vs incremental exports

Your initial sync is a set of files containing historical data to represent your workspace’s current state. Subsequent sync files contain changesets.

  • Metrics: The initial metrics sync is broken up into files with two sequence numbers, as follows. <name>_v4_<workspace_id>_<sequence1>_<sequence2>.
  • Attributes: The initial Attributes sync includes a list of profiles and their current attributes. Subsequent files will only contain attribute changes, with one change per row.
  • Events: The initial events sync includes events each person has performed over the past 30 days. Subsequent files contain events since the previous sync interval. We cannot backfill events older than 30 days with the initial sync.
flowchart LR a{is it the initial sync?}-->|yes|b[send all history] a-->|no|c{was the file
already enabled?} c-->|yes|d[send changes since last sync] c-->|no|e{was the file
ever enabled?} e-->|yes|f[send changeset since
file was disabled] e-->|no|g[send all history]

For example, let’s say you’ve enabled the Attributes export. We will attempt to sync your data to your storage bucket every 15 minutes:

  1. 12:00pm We sync your Attributes Schema for the first time. This includes a list of profiles and their current attributes.
  2. 12:05pm User1’s email is updated to company-email@example.com.
  3. 12:10pm User1’s email is updated to personal-email@example.com.
  4. 12:15 We sync your data again. In this export, you would only see attribute changes, with one change per row. User1 would have one row dedicated to his email changing.

Requirements

If you use a firewall or an allowlist, you must allow the following IP addresses to support traffic from Customer.io.

Make sure you use the correct IP addresses for your account region.

US RegionEU Region
34.71.192.24534.76.81.2
35.184.88.7635.187.55.80
34.72.101.57104.199.99.65
34.123.199.3334.77.146.181
34.67.167.19034.140.234.108
35.240.84.170

 Do you use other Customer.io features?

These IP addresses are specific to outgoing Data Warehouse integrations. If you use your own SMTP server or receive webhooks, you may also need to allow additional addresses. See our complete IP allowlist.

Set up an Azure Blob Storage integration

As a part of this process, you’ll create an Access policy and a Shared Access Signature (SAS) URL. The Shared Access Signature grants Customer.io access to your Azure blob container, but typically has a limited expiration date. Before you generate a SAS URL, you’ll create the Access policy (with read, write, add, create, and list permissions) that lets you set a longer expiration date for your SAS URL and provides a way to revoke the token later, if you decide to shut off this integration for any reason.

 You must generate an SAS URL for a container, not a storage account!

Please follow the steps below to generate an SAS URL for a specific container in your Azure Blob Storage account. You can’t use a SAS URL for the storage account with Customer.io.

  1. Login to your Azure account, go to Storage browser, and select Blob Containers.

  2. Right click the container you want to export Customer.io data to, and select Access policy to create a policy allowing you to create a SAS URL with a long expiry date.

    1. Click Add policy and set an Identifier for the policy. This is just the name of the access policy that you’ll use in later steps.
    2. Click Permissions and select read, write, add, create, and list.
    3. Set the Start time to the current date.
    4. Set the Expiry time to a date well into the future.
    5. Click OK and then click Save.
      Create your access policy in MS Azure
      Create your access policy in MS Azure
  3. Right click the container again and select Generate SAS to generate the URL that Customer.io will use to access your Azure bucket.

    • Select the Stored access policy you created in previous steps.
    • (Optional) List Customer.io’s IP addresses under Allowed IP addresses to provide an extra layer of security for your SAS URL.
    • Click Generate SAS token and URL and copy the URL. When you close the dialog, you won’t be able to access the token or URL again, so make sure that you copy the URL. You’ll need it in later steps.
      Set up your SAS token in the Azure dashboard
      Set up your SAS token in the Azure dashboard
  4. Go to Customer.io and select Data & Integrations > Integrations > Azure Blob Storage.

  5. Click Sync your Azure Blob Storage bucket.

  6. Enter the Blob Path: this is the directory in the blob where you want to deposit parquet files with each sync. If you don’t provide a path, we’ll deposit files in the root of the blob. If the path doesn’t already exist, clicking “Validate & Select Data” will create a new blob storage path.

    Add your blob storage path and URL
    Add your blob storage path and URL

  7. Paste your Blob SAS URL in the appropriate box and click Validate & select data.

  8. Select the data types that you want to export from Customer.io to your bucket. By default, we export all data types, but you can disable the types that you aren’t interested in.

    select the files you want to sync to your azure blob path
    select the files you want to sync to your azure blob path

  9. Click Create and sync data.

Pausing and resuming your sync

You can turn off files you no longer want to receive, or pause them momentarily as you update your integration, and turn them back on. When you turn a file schema on, we send files to catch you up from the last export.If you haven’t exported a particular file before—the file was never “on”—the initial sync contains your historical data.

You can also disable your entire sync, in which case we’ll quit sending files all together. When you enable your sync again, we send all of your historical data as if you’re starting a new integration. Before you disable a sync, consider if you simply want to disable individual files and resume them later.

 Delete old sync files before you re-enable a sync

Before you resume a sync that you previously disabled, you should clear any old files from your storage bucket so that there’s no confusion between your old files and the files we send with the re-enabled sync.

Disabling and enabling individual export files

  1. Go to Data & Integrations > Integrations and select Azure Blob Storage.
  1. Select the files you want to turn on or off.

When you enable a file, the next sync will contain baseline historical data catching up from your previous sync or the complete history if you haven’t synced a file before; subsequent syncs will contain changesets.

 Turning the People file off

If you turn the People file off for more than 7 days, you will not be able to re-enable it. You’ll need to delete your sync configuration, purge all sync files from your destination storage bucket, and create a new sync to resume syncing people data.

enable or disable syncs
enable or disable syncs

Disabling your sync

If your sync is already disabled, you can enable it again with these instructions. But, before you re-enable your sync, you should clear the previous sync files from your data warehouse bucket first. See Pausing and resuming your sync for more information.

  1. Go to Data & Integrations > Integrations and select Azure Blob Storage.
  2. Click Disable Sync.

Manage your configuration

You can change settings for a bucket, if your path changes or you need to swap keys for security purposes.

  1. Go to Data & Integrations > Integrations and select Azure Blob Storage.
  2. Click Manage Configuration for your bucket.
  3. Make your changes. No matter your changes, you must input your Blob SAS URL.
  4. Click Update Configuration. Subsequent syncs will use your new configuration.
Edit your data warehouse bucket configuration
Edit your data warehouse bucket configuration

Update sync schema version

Before you prepare to update your data warehouse sync version, see the changelog. You’ll need to update schemas to upgrade to the latest version (v4).

 When updating from v1 to a later version, you must:

  • Update ingestion logic to accept the new file name format: <name>_v<x>_<workspace_id>_<sequence>.parquet
  • Delete existing rows in your Subjects and Outputs tables. When you update, we send all of your Subjects and Outputs data from the beginning of your history using the new file schema.

dws-upgrade-version.png
dws-upgrade-version.png
  1. Go to Data & Integrations > Integrations and select Azure Blob Storage.
  2. Click Upgrade Schema Version.
  3. Follow the instructions to make sure that your ingestion logic is updated accordingly.
  4. Confirm that you’ve made the appropriate pages and click Upgrade sync. The next sync uses the updated schema version.

Parquet file schemas

This section describes the different kinds of files you can export from our Database-out integrations. Many schemas include an internal_customer_id—this is the cio_idAn identifier for a person that is automatically generated by Customer.io and cannot be changed. This identifier provides a complete, unbroken record of a person across changes to their other identifiers (id, email, etc).. You can use it to resolve a person associated with a subject, delivery, etc.

These schemas represent the latest versions available. Check out our changelog for information about earlier versions.

Deliveries are individual email, in-app, push, SMS, slack, and webhook records sent from your workspace. The first deliveries export file includes baseline historical data. Subsequent files contain rows for data that changed since the last export.

Field NamePrimary KeyForeign KeyDescription
workspace_idINTEGER (Required). The ID of the Customer.io workspace associated with the delivery record.
delivery_idSTRING (Required). The ID of the delivery record.
internal_customer_idPeopleSTRING (Nullable). The cio_id of the person in question. Use the people parquet file to resolve this ID to an external customer_id or email address.
subject_idSubjectsSTRING (Nullable). If the delivery was created as part of a Campaign or API Triggered Broadcast workflow, this is the ID for the path the person went through in the workflow. Note: This value refers to, and is the same as, the subject_name in the subjects table.
event_idSubjectsSTRING (Nullable). If the delivery was created as part of an event-triggered Campaign, this is the ID for the unique event that triggered the workflow. Note that this is a foreign key for the subjects table, and not the metrics table.
delivery_typeSTRING (Required). The type of delivery: email, push, in-app, sms, slack, or webhook.
campaign_idINTEGER (Nullable). If the delivery was created as part of a Campaign or API Triggered Broadcast workflow, this is the ID for the Campaign or API Triggered Broadcast.
action_idINTEGER (Nullable). If the delivery was created as part of a Campaign or API Triggered Broadcast workflow, this is the ID for the unique workflow item that caused the delivery to be created.
newsletter_idINTEGER (Nullable). If the delivery was created as part of a Newsletter, this is the unique ID of that Newsletter.
content_idINTEGER (Nullable). If the delivery was created as part of a Newsletter split test, this is the unique ID of the Newsletter variant.
trigger_idINTEGER (Nullable). If the delivery was created as part of an API Triggered Broadcast, this is the unique trigger ID associated with the API call that triggered the broadcast.
created_atTIMESTAMP (Required). The timestamp the delivery was created at.
transactional_message_idINTEGER (Nullable). If the delivery occurred as a part of a transactional message, this is the unique identifier for the API call that triggered the message.
seq_numINTEGER (Required) A monotonically increasing number indicating relative recency for each record: the larger the number, the more recent the record.

Troubleshooting

I get a 403 error in Customer.io

This means that your Shared Access Signature URL doesn’t grant Customer.io permission to access your Azure blob store. Make sure that your Access policy or Shared access signature grant read, add, create, write, and list permissions. If not, you may need to edit your Access policy or generate a new SAS URL and paste it into Customer.io to fix the issue.

I can’t extend my SAS URL’s expiry date

By default, you Microsoft Azure doesn’t allow a Shared Access Signature (SAS) more than 1 week or 365 days in the future, depending on the signing method you use. You must create an Access policy that allows you to create tokens with a longer expiration period.

In general, we suggest that you create an Access policy as a way to both extend the life of your SAS token and as a method for revoking your SAS token later if you decide to turn off this integration.

To create an Access policy, right click your blob container in Microsoft Azure and go to Access policy. Add a policy with the same permissions as your SAS token (read, add, create, write, and list permissions), and set an expiry date in the future.

Copied to clipboard!
  Contents
Is this page helpful?