Introduction to database/data warehouse integrations
UpdatedWe offer two types of database and data warehouse integrations. This page explains more about how these integrations work and the differences between them.
How database and data warehouse integrations work
Rather than streaming data to your warehouse in real time, like with most outbound integrations, our database and data warehouse integrations send data to your storage buckets in bulk at regular intervals. Then you’ll ingest those files into the data warehouse or database of your choice.
These integrations only create new files in your storage bucket; they’ll never overwrite or append an existing file, so you can delete or remove files from your storage bucket after you ingest them into their ultimate integration—your data warehouse or database.
Standard integrations vs advanced integrations
When you look for databases and data warehouses in our integration directory, you may see advanced entries.
- Standard, non-advanced integrations send data from your workspace. This is data that you use in Customer.io to send messages, trigger campaigns, and so on.
- Advanced integrations let you send data to a warehouse from any of your data sources even if you don’t keep that data in Customer.io. But it excludes some of the data that you keep in Customer.io.
For example, if you send an identify
call into Customer.io, the Advanced integration will receive data from the call more or less as you sent it. The Standard integration will receive the data as we process it in Customer.io—so you’ll see the changes made to the person in Customer.io. If your identify
call doesn’t change anything, you won’t see any data/change in the Standard integration.
But, if you need data about things like the campaigns people travel through, the messages they’ve received, and so on, you cannot get that data through the advanced versions of our database and data warehouse integrations. You’ll need to use the Standard integrations to get that data.
Standard and advanced integrations have different schemas
Make sure you use the right documentation for your integration, because the data we output uses different schemas depending on your integration.
Feature | Description | Standard | Advanced |
---|---|---|---|
Supports campaign data | Data about the workflows and changes to campaigns. | ✅ | ❌ |
Supports actionA block in a campaign workflow—like a message, delay, or attribute change. data | Data about the individual actionsA block in a campaign workflow—like a message, delay, or attribute change. people go through in each campaign journey. | ✅ | ❌ |
Supports broadcast data | Information about the newsletters and API-triggered broadcasts. | ✅ | ❌ |
Supports customer journeysA person or data object’s path through your campaign. (subjects) | Data about the journeysA person or data object’s path through your campaign. people take through your campaigns. | ✅ | ❌ |
Parquet output | We sync files to your bucket in parquet format | ✅ | ✅ |
CSV output | We sync files to your bucket in CSV | ❌ | ✅ |
JSON output | We sync files to your bucket in JSON | ❌ | ✅ |
Sync interval | How often we sync to your storage bucket | ~15 minutes | 10 minutes |
Supports data from outside your workspace | Pass data to your warehouse without processing it in Customer.io (advanced). | ❌ | ✅ |
Standard integrations
Our sync integrations pass information from your workspace to your storage bucket. That’s why we call them sync integrations: they sync data from your workspace to your data warehouse.
For these integrations, we only send data to your integration after we’ve processed it in Customer.io. Our sync integrations let you send data about campaigns, journeys, and broadcasts do your data warehouse. These are our only integrations that support this kind of data!
Advanced integrations
We send incoming data to your storage bucket even if you don’t store that data in Customer.io. In general, this means that the schemas match up with our libraries. Like when you send an identify
call, your data shows up in a file we send to your storage bucket called identify.
But, unlike our sync integrations, non-sync integrations don’t have access to campaign and broadcast information from your workspace.