You might not have access to this feature!

This feature is only available on our premium and enterprise plans. [Talk to our team](mailto:premium@customer.io) about upgrading your plan.

# Advanced Integrations

[PremiumThis feature is available for Premium plans.](/accounts-and-workspaces/plan-features/) [EnterpriseThis feature is available for Enterprise plans.](/accounts-and-workspaces/plan-features/)

 Want information from your workspace, including campaign journeys?

Try our standard [Data Warehouse integration](/integrations/data-out/data-warehouses/data-warehouses-intro/) to get data from your workspace, including campaign journeys.

## How it works[](#how-it-works)

This integration forwards incoming data from data sources to your storage bucket *independently of your workspace*. This means that you can send data to your storage bucket even if you don’t store that data in Customer.io. It also means that this integration does not have access to campaign, broadcast, or journey information from your workspace. If you need that data, you should use a standard [Data Warehouse integration](/integrations/data-out/data-warehouses/data-warehouses-intro/) instead.

Rather than streaming data to storage buckets in real time, like with most other data-out integrations, our data warehouse and storage integrations send data to your storage buckets in bulk at regular, 10-minute intervals. When we load data, we insert and update events, people, and groups, in JSON, CSV, or parquet files that we upload to your storage bucket. You can then ingest those files into the data warehouse or database of your choice.

These integrations only create new files in your storage bucket; they’ll never overwrite or append an existing file, so you can delete or remove files from your storage bucket after you ingest them into their ultimate destination—your data warehouse or database.

## Exported files[](#exported-files)

Our data warehouse and cloud storage integrations generate parquet, JSON, or CSV files that we load in a storage bucket you specify. The data we send (the files we generate in your storage bucket) are based on the *Actions* you enable.

Each sync generates new files for each data type in your storage bucket. Files are named in the format `<integration id>.<action id>.<current position>.<type>`.

*   The integration ID and action ID are unique identifiers generated by Customer.io. You’ll see them with the first sync.
*   `current position` is an incrementing number beginning at 1 that indicates the order of syncs. So your first sync is 1, the next one is 2, etc.
*   `type` is the type of call—`identify`, `track`, `page`, `screen`, `alias`, or `group`.

So, if your file is called `2184.13699.1.track.json`, it’s the first sync file for the `track` call type.

## Sync frequency[](#sync-frequency)

Unlike other integrations where we send data in real time, these kinds of integrations attempt to send data to your storage bucket every 10 minutes—though actual sync intervals and processing times may vary. When syncing large data sets, or when you have a high volume of concurrent sync operations, it can take a little longer to process and export data.

Each sync file contains data from the previous sync interval. For example, if the last sync occurred at 12:00 PM, the next sync will only send data from 12:00 PM to 12:09:59 PM.

## Handling objects and arrays in CSV and Parquet files[](#handling-objects-and-arrays-in-csv-and-parquet-files)

Our incoming integrations pass nested objects and arrays into calls as **properties** and **traits**, but CSVs and Parquet files don’t have a concept of objects or arrays. So we stringify or flatten properties and traits in CSVs and Parquet files to preserve your data without significantly manipulating it.

```JSON
{
  "received_at": "2019-08-24T14:15:22Z",
  "id": "a7280cfea0f6d",
  "user_id": "97980cfea0067",
  "anonymous_id": "d19b0cfeb606a",
  "sent_at": "2019-08-24T14:15:22Z",
  "traits": {
    "name": "Cool Person",
    "email": "cool.person@example.com",
    "likes_baseball": true
  },
  "context": {
    ...
  }
}
```

```fallback
received_at,id,user_id,anonymous_id,sent_at,traits,context
2019-08-24T14:15:22Z,a7280cfea0f6d,97980cfea0067,d19b0cfeb606a,2019-08-24T14:15:22Z,"{\"name\": \"Cool Person\", \"email\": \"cool.person@example.com\", \"likes_baseball\": true}", "{...}"
```

## Schemas[](#schemas)

When we load data into your storage buckets, we create and update files to match the shape of your incoming data. Note that we flatten or stringify nested objects and arrays according to [the rules above](#handling-objects-and-arrays).

### Identifies schema[](#identifies-schema)

*Identifies* files contain [identify](/integrations/api/cdp/#operation/identify) calls sent into Customer.io. The `context` and `traits` in the schema below are objects in JSON. In CSV and parquet files, these columns contain stringified objects.

*   traits object
    
    Additional properties that you know about a person. We’ve listed some common/reserved traits below, but you can add any traits that you might use in another system.
    
    *   createdAt string  (date-time)
        
        We recommend that you pass date-time values as ISO 8601 date-time strings. We convert this value to fit destinations where appropriate.
        
    *   email string
        
        A person’s email address. In some cases, you can pass an empty `userId` and we’ll use this value to identify a person.
        
    *   *Additional Traits\** any type
        
        Traits that you want to set on a person. These can take any JSON shape.
        

### Groups schema[](#groups-schema)

*Groups* files contain `group` calls made from your data-in integrations. If your integration outputs CSV or parquet files, the `context` and `traits` columns contain stringified objects.

*   traits object
    
    Additional data points that the call assigns to the group.
    
    *   *Additional Traits\** any type
        
        Traits can have any name, like \`account\_name\` or \`total\_employees\`. These can take any JSON shape.
        

### Page schema[](#page-schema)

*Pages* contains entries for the `page` calls your integrations send into Customer.io. If your integration outputs CSV or parquet files, the `context` and `properties` columns contain stringified objects. If your integration outputs JSON files, the `context` and `properties` columns contain objects.

*   properties object
    
    Additional properties sent with the page call. We’ve listed some common/reserved traits captured by our `Analytics.js` library, but you can add any properties that you might use in another system.
    
    *   category string
        
        The category of the page. This might be useful if you have a single page routes or have a flattened URL structure.
        
    *   path string
        
        The path of the page. This defaults to `location.pathname`, but can be overridden.
        
    *   referrer string
        
        The referrer of the page, if applicable. This defaults to `document.referrer`, but can be overridden.
        
    *   search string
        
        The search query in the URL, if present. This defaults to `location.search`, but can be overridden.
        
    *   title string
        
        The title of the page. This defaults to `document.title`, but can be overridden.
        
    *   url string
        
        The URL of the page. This defaults to a canonical url if available, and falls back to `document.location.href`.
        
    *   *Page Properties\** any type
        

### Screen schema[](#screen-schema)

*Screens* files contain entries for the `screen` calls sent to Customer.io. If your integration outputs CSV or parquet files, the `context` and `properties` columns contain stringified objects. If your integration outputs JSON files, the `context` and `properties` columns contain objects.

*   properties object
    
    Additional properties that you sent in your screen event
    
    *   *Additional event properties\** any type
        
        Properties that you sent in the event. These can take any JSON shape.
        

### Track Schema[](#track-schema)

*Tracks* contains entries for the `track` calls you send to Customer.io. It shows information about the events your users perform.

If your integration outputs CSV or parquet files, the `context` and `properties` columns contain stringified objects. If your integration outputs JSON files, the `context` and `properties` columns contain objects.

*   event string
    
    The slug of the event name, mapping to an event-specific table.
    
*   event\_text string
    
    The name of the event.
    
*   properties object
    
    Additional properties sent with the page call. We’ve listed some common/reserved traits captured by our `Analytics.js` library, but you can add any properties that you might use in another system.
    
    *   *Event Properties\** any type
        

## Alias[](#alias)

The Alias schema contains entries for the `alias` calls you send to Customer.io. It shows information about the users you merge, with each entry showing a user’s new `user_id` and their `previous_id`.

## Timestamps[](#timestamps)

We associate four timestamps with every incoming call to Customer.io: `timestamp`, `original_timestamp`, `sent_at` and `received_at`. All four timestamps pass through to your warehouse, and it may help to understand the purpose of each.

In general, you should use `timestamp` when you query for historical events and `received_at` for all other queries based on time.

`timestamp` is the UTC-converted timestamp set by the Customer.io library. If you import historical events using a server-side library, this is the timestamp you’ll want to reference in your queries.

`original_timestamp` is the original timestamp set on data that comes into Customer.io. This timestamp can be affected by device clock skew. You can override this value by manually passing a `timestamp` in your incoming calls, which we map to the `original_timestamp`. Generally, this timestamp should be ignored in favor of the `timestamp` column.

`sent_at` is a UTC timestamp set when you send calls to Customer.io. This timestamp can also be affected by device clock skew.

`received_at` is a UTC timestamp set by Customer.io when we receive a payload. All tables use `received_at` as the sort key.

 Use `received_at` for queries based on times

The `sent_at` timestamp relies on a client’s device clock being accurate, which can be unreliable.

## id[](#id)

Each row in your database has an `id` which is equivalent to the `messageId` that our libraries pass in incoming calls. This is a unique identifier associated with the row.

## Sort Key[](#sort-key)

All tables use `received_at` as the sort key. Amazon Redshift stores your data on disk in sorted order according to the sort key. The Redshift query optimizer uses sort order when it determines optimal query plans.