Microsoft Azure Blob Storage (Advanced)

Updated December 9, 2025

About this integration

Azure Blob Storage is a massively scalable object storage for unstructured data. It is designed for durability, availability, and scalability. It offers cost-effective storage for data such as text, binary data, and media files.

Mode How we forward source data to the destination: through Customer.io's servers or directly from our JavaScript client.	Web sources Indicates whether or not this integration supports our the JavaScript client.	API sources Indicates whether or not this integration supports our server libraries (Go, NodeJS, Python), API, Mobile SDK, and other data sources.	Supported calls The API methods this integration supports.	Integration name The name of this integration if you want to enable or disable it in the `integrations` object.
Standard			alias, group, identify, page, screen, and track	Azure Blob Storage

How it works

This integration sends CSV, JSON, or parquet files containing your data to your MS Azure Blob Storage (Advanced) bucket. Then you can ingest the files in your storage bucket to your data warehouse of choice.

We write files for each type of incoming call to your storage bucket every 10 minutes. So you’ll have files for identify calls, track calls, and so on. Files are named with an incrementing number, so it’s easy to determine the sequence of files, and the order of incoming calls.

sequenceDiagram participant a as Customer.io participant b as Storage Bucket participant c as MS Azure Blob Storage (Advanced) loop every 10 minutes a->>b: export CSV, JSON, or parquet files b->>c: ingest c->>b: expire/delete files
before next sync end

Sync frequency and file names

Syncs occur every 10 minutes. Each sync file contains data from the previous sync interval. For example, if the last sync occurred at 12:00 PM, the next sync will only send data from 12:00 PM to 12:09:59 PM.

Each sync generates new files for each data type in your storage bucket. Files are named in the format <integration id>.<integration action id>.<current position>.<type>.

The integration ID and action ID are unique identifiers generated by Customer.io. You’ll see them with the first sync.
current position is an incrementing number beginning at 1 that indicates the order of syncs. So your first sync is 1, the next one is 2, etc.
type is the type of incoming call—identify, track, page, screen, alias, or group.

So, if your file is called 2184.13699.1.track.json, it’s the first sync file for the track call type.

Getting started

Go to Data & Integrations > Integrations and select MS Azure Blob Storage (Advanced) in the Directory tab.
Connect to your storage bucket:
1. Endpoint: Endpoint for the internal ETL API.
2. Token: Authentication token for the internal ETL API.
3. Format: Format of the data files that will be created.
4. Blob Sas Url: The SAS URL of the Azure Blob Storage container with permissions to upload files to a container. Learn how to generate an Azure SAS URL in our documentation.
5. Blob Path: Optional folder inside the container where files will be written to.
Review your setup and click Finish to enable your integration.

Schemas

The following schemas represent JSON for the different types of files we export to your storage bucket (identify, track, and so on). For CSV and Parquet files, we stringify objects and arrays. For example, if identify calls contain the traits object with a first_name and last_name, CSV files output to your storage bucket will contain a traits column with data that looks like this for each row: "{ "\first_name\": \"Bugs\", \"last_name\": \"Bunny\" }".

identify

Identifies files contain identify calls sent to Customer.io. The context and traits in the schema below are objects in JSON. In CSV and parquet files, these columns contain stringified objects.

anonymous_id string
A unique substitute for a User ID in cases when you don’t have an absolutely unique identifier. Our libraries generate this value automatically to help you track people before they sign up, log in, provide their email, etc.
context
A dictionary of context about a source call/event, like the user’s IP address or locale. Context is automatically collected by our source libraries.
- active boolean
  
  Whether a user is active.
  This is usually used when you send an .identify() call to update the traits independently of when you’ve “last seen” a user.
- channel string
  The channel the event originated from.
  Accepted values:browser,server,mobile
- ip string
  The user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
- locale string
  The locale string for the current user, e.g. en-US.
- userAgent string
  The user agent of the device making the request
- campaign object
  Contains information about the campaign that resulted in the API call, gathered from, or mapping to, UTM parameters (e.g. utm_source).
  - content string
  - medium string
    The type of traffic a person/event originates from, like email, or referral.
  - name string
    The campaign name.
  - source string
    The source of traffic—like the name of your email list, Facebook, Google, etc.
  - term string
    The keyword term(s) a user came from.
  - Additional UTM Parameters* string
- page object
  Contains information about the current page in the browser. This is automatically collected by our JavaScript source.
  - keywords array of [ strings ]
    A list/array of keywords describing the page’s content. The keywords are likely the same as, or similar to, the keywords you would find in an HTML meta tag for SEO purposes. This property is mainly used by content publishers that rely heavily on pageview tracking. This isn’t automatically collected.
  - name string
    The name of the page. Reserved for future use.
  - path string
    The path portion of the page’s URL. Equivalent to the canonical path which defaults to location.pathname from the DOM API.
  - referrer string
    The previous page’s full URL. Equivalent to document.referrer from the DOM API.
  - search string
    The query string portion of the page’s URL. Equivalent to location.search from the DOM API.
  - title string
    The page’s title. Equivalent to document.title from the DOM API.
  - url string
    A page’s full URL. We first look for the canonical URL. If the canonical URL is not provided, we’ll use location.href from the DOM API.
- active boolean
  
  Whether a user is active.
  This is usually used when you send an .identify() call to update the traits independently of when you’ve “last seen” a user.
- channel string
  The channel the event originated from.
  Accepted values:browser,server,mobile
- ip string
  The user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
- locale string
  The locale string for the current user, e.g. en-US.
- userAgent string
  The user agent of the device making the request
- app object
  Contains information about the mobile app the event originated from, automatically collected by our mobile libraries when possible.
  - build string
    The specific build number in the app.
  - name string
    The name of the app.
  - namespace string
    The app’s namespace.
  - version string
    The version of the app the call originated from.
- device object
  Contains information about the device the event originated from.
  - advertisingId string
    The advertising ID is a unique, anonymous ID for advertising.
  - id string
    The device ID.
  - manufacturer string
    The device manufacturer.
  - model string
    The device model.
  - name string
    The device name.
  - type string
    The device type—android, iOS, etc.
    Accepted values:android,ios
  - version string
    The firmware version for the device.
- network object
  Information about the current network connection, containing bluetooth, carrier, cellular, and wifi. If the context.network.cellular and context.network.wifi fields are empty, then the user is offline.
  - bluetooth boolean
    Lets you know if bluetooth is enabled on a device.
  - carrier string
    The cellular carrier the phone uses.
  - cellular boolean
    Indicates whether the device’s cellular connection is enabled or not.
  - wifi boolean
    Indicates whether a device’s wifi connection is enabled or not.
- os object
  Dictionary of information about the operating system, containing name and version.
  - name string
    The operating system running on the device.
  - version string
    The version of the OS running on the device.
id string
A unique identifier for a Data Pipelines event, ensuring that each individual event is unique.
received_at integer
The Unix timestamp (in seconds) when Data Pipelines receives an event.
sent_at integer
The Unix timestamp (in seconds) when a library sends an event to Data Pipelines.
traits object
Additional properties that you know about a person. We’ve listed some common/reserved traits below, but you can add any traits that you might use in another system.
- createdAt string (date-time)
  We recommend that you pass date-time values as ISO 8601 date-time strings. We convert this value to fit destinations where appropriate.
- email string
  A person’s email address. In some cases, you can pass an empty userId and we’ll use this value to identify a person.
- Additional Traits* any type
  Traits that you want to set on a person. These can take any JSON shape.
user_id string
The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.

group

Groups files contain group calls sent to Customer.io. If your integration outputs CSV or parquet files, the context and traits columns contain stringified objects.

anonymous_id string
A unique substitute for a User ID in cases when you don’t have an absolutely unique identifier. Our libraries generate this value automatically to help you track people before they sign up, log in, provide their email, etc.
group_id string
ID of the group
id string
A unique identifier for a Data Pipelines event, ensuring that each individual event is unique.
objectTypeId string
If you use Customer.io Journeys as a destination, this value is the type of group/object your group belongs to; object type IDs are stringified integers. If you don’t include this value, we assume the object type ID is 1. See objects in Customer.io Journeys for more information.
received_at integer
The Unix timestamp (in seconds) when Data Pipelines receives an event.
sent_at integer
The Unix timestamp (in seconds) when a library sends an event to Data Pipelines.
traits object
Additional data points that the call assigns to the group.
- Additional Traits* any type
  Traits can have any name, like `account_name` or `total_employees`. These can take any JSON shape.
user_id string
The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.

track

Tracks contains entries for the track calls you send to Customer.io. It shows information about the events your users perform.

If your integration outputs CSV or parquet files, the context and properties columns contain stringified objects. If your integration outputs JSON files, the context and properties columns contain objects.

anonymous_id string
A unique substitute for a User ID in cases when you don’t have an absolutely unique identifier. Our libraries generate this value automatically to help you track people before they sign up, log in, provide their email, etc.
context
A dictionary of context about a source call/event, like the user’s IP address or locale. Context is automatically collected by our source libraries.
- active boolean
  
  Whether a user is active.
  This is usually used when you send an .identify() call to update the traits independently of when you’ve “last seen” a user.
- channel string
  The channel the event originated from.
  Accepted values:browser,server,mobile
- ip string
  The user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
- locale string
  The locale string for the current user, e.g. en-US.
- userAgent string
  The user agent of the device making the request
- campaign object
  Contains information about the campaign that resulted in the API call, gathered from, or mapping to, UTM parameters (e.g. utm_source).
  - content string
  - medium string
    The type of traffic a person/event originates from, like email, or referral.
  - name string
    The campaign name.
  - source string
    The source of traffic—like the name of your email list, Facebook, Google, etc.
  - term string
    The keyword term(s) a user came from.
  - Additional UTM Parameters* string
- page object
  Contains information about the current page in the browser. This is automatically collected by our JavaScript source.
  - keywords array of [ strings ]
    A list/array of keywords describing the page’s content. The keywords are likely the same as, or similar to, the keywords you would find in an HTML meta tag for SEO purposes. This property is mainly used by content publishers that rely heavily on pageview tracking. This isn’t automatically collected.
  - name string
    The name of the page. Reserved for future use.
  - path string
    The path portion of the page’s URL. Equivalent to the canonical path which defaults to location.pathname from the DOM API.
  - referrer string
    The previous page’s full URL. Equivalent to document.referrer from the DOM API.
  - search string
    The query string portion of the page’s URL. Equivalent to location.search from the DOM API.
  - title string
    The page’s title. Equivalent to document.title from the DOM API.
  - url string
    A page’s full URL. We first look for the canonical URL. If the canonical URL is not provided, we’ll use location.href from the DOM API.
- active boolean
  
  Whether a user is active.
  This is usually used when you send an .identify() call to update the traits independently of when you’ve “last seen” a user.
- channel string
  The channel the event originated from.
  Accepted values:browser,server,mobile
- ip string
  The user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
- locale string
  The locale string for the current user, e.g. en-US.
- userAgent string
  The user agent of the device making the request
- app object
  Contains information about the mobile app the event originated from, automatically collected by our mobile libraries when possible.
  - build string
    The specific build number in the app.
  - name string
    The name of the app.
  - namespace string
    The app’s namespace.
  - version string
    The version of the app the call originated from.
- device object
  Contains information about the device the event originated from.
  - advertisingId string
    The advertising ID is a unique, anonymous ID for advertising.
  - id string
    The device ID.
  - manufacturer string
    The device manufacturer.
  - model string
    The device model.
  - name string
    The device name.
  - type string
    The device type—android, iOS, etc.
    Accepted values:android,ios
  - version string
    The firmware version for the device.
- network object
  Information about the current network connection, containing bluetooth, carrier, cellular, and wifi. If the context.network.cellular and context.network.wifi fields are empty, then the user is offline.
  - bluetooth boolean
    Lets you know if bluetooth is enabled on a device.
  - carrier string
    The cellular carrier the phone uses.
  - cellular boolean
    Indicates whether the device’s cellular connection is enabled or not.
  - wifi boolean
    Indicates whether a device’s wifi connection is enabled or not.
- os object
  Dictionary of information about the operating system, containing name and version.
  - name string
    The operating system running on the device.
  - version string
    The version of the OS running on the device.
event string
The slug of the event name, mapping to an event-specific table.
event_text string
The name of the event.
id string
A unique identifier for a Data Pipelines event, ensuring that each individual event is unique.
properties object
Additional properties sent with the page call. We’ve listed some common/reserved traits captured by our Analytics.js library, but you can add any properties that you might use in another system.
- Event Properties* any type
received_at integer
The Unix timestamp (in seconds) when Data Pipelines receives an event.
sent_at integer
The Unix timestamp (in seconds) when a library sends an event to Data Pipelines.
user_id string
The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.

page

Pages contains entries for the page calls sent to Customer.io. If your integration outputs CSV or parquet files, the context and properties columns contain stringified objects. If your integration outputs JSON files, the context and properties columns contain objects.

anonymous_id string
A unique substitute for a User ID in cases when you don’t have an absolutely unique identifier. Our libraries generate this value automatically to help you track people before they sign up, log in, provide their email, etc.
context
A dictionary of context about a source call/event, like the user’s IP address or locale. Context is automatically collected by our source libraries.
- active boolean
  
  Whether a user is active.
  This is usually used when you send an .identify() call to update the traits independently of when you’ve “last seen” a user.
- channel string
  The channel the event originated from.
  Accepted values:browser,server,mobile
- ip string
  The user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
- locale string
  The locale string for the current user, e.g. en-US.
- userAgent string
  The user agent of the device making the request
- campaign object
  Contains information about the campaign that resulted in the API call, gathered from, or mapping to, UTM parameters (e.g. utm_source).
  - content string
  - medium string
    The type of traffic a person/event originates from, like email, or referral.
  - name string
    The campaign name.
  - source string
    The source of traffic—like the name of your email list, Facebook, Google, etc.
  - term string
    The keyword term(s) a user came from.
  - Additional UTM Parameters* string
- page object
  Contains information about the current page in the browser. This is automatically collected by our JavaScript source.
  - keywords array of [ strings ]
    A list/array of keywords describing the page’s content. The keywords are likely the same as, or similar to, the keywords you would find in an HTML meta tag for SEO purposes. This property is mainly used by content publishers that rely heavily on pageview tracking. This isn’t automatically collected.
  - name string
    The name of the page. Reserved for future use.
  - path string
    The path portion of the page’s URL. Equivalent to the canonical path which defaults to location.pathname from the DOM API.
  - referrer string
    The previous page’s full URL. Equivalent to document.referrer from the DOM API.
  - search string
    The query string portion of the page’s URL. Equivalent to location.search from the DOM API.
  - title string
    The page’s title. Equivalent to document.title from the DOM API.
  - url string
    A page’s full URL. We first look for the canonical URL. If the canonical URL is not provided, we’ll use location.href from the DOM API.
- active boolean
  
  Whether a user is active.
  This is usually used when you send an .identify() call to update the traits independently of when you’ve “last seen” a user.
- channel string
  The channel the event originated from.
  Accepted values:browser,server,mobile
- ip string
  The user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
- locale string
  The locale string for the current user, e.g. en-US.
- userAgent string
  The user agent of the device making the request
- app object
  Contains information about the mobile app the event originated from, automatically collected by our mobile libraries when possible.
  - build string
    The specific build number in the app.
  - name string
    The name of the app.
  - namespace string
    The app’s namespace.
  - version string
    The version of the app the call originated from.
- device object
  Contains information about the device the event originated from.
  - advertisingId string
    The advertising ID is a unique, anonymous ID for advertising.
  - id string
    The device ID.
  - manufacturer string
    The device manufacturer.
  - model string
    The device model.
  - name string
    The device name.
  - type string
    The device type—android, iOS, etc.
    Accepted values:android,ios
  - version string
    The firmware version for the device.
- network object
  Information about the current network connection, containing bluetooth, carrier, cellular, and wifi. If the context.network.cellular and context.network.wifi fields are empty, then the user is offline.
  - bluetooth boolean
    Lets you know if bluetooth is enabled on a device.
  - carrier string
    The cellular carrier the phone uses.
  - cellular boolean
    Indicates whether the device’s cellular connection is enabled or not.
  - wifi boolean
    Indicates whether a device’s wifi connection is enabled or not.
- os object
  Dictionary of information about the operating system, containing name and version.
  - name string
    The operating system running on the device.
  - version string
    The version of the OS running on the device.
id string
A unique identifier for a Data Pipelines event, ensuring that each individual event is unique.
properties object
Additional properties sent with the page call. We’ve listed some common/reserved traits captured by our Analytics.js library, but you can add any properties that you might use in another system.
- category string
  The category of the page. This might be useful if you have a single page routes or have a flattened URL structure.
- path string
  The path of the page. This defaults to location.pathname, but can be overridden.
- referrer string
  The referrer of the page, if applicable. This defaults to document.referrer, but can be overridden.
- search string
  The search query in the URL, if present. This defaults to location.search, but can be overridden.
- title string
  The title of the page. This defaults to document.title, but can be overridden.
- url string
  The URL of the page. This defaults to a canonical url if available, and falls back to document.location.href.
- Page Properties* any type
received_at integer
The Unix timestamp (in seconds) when Data Pipelines receives an event.
sent_at integer
The Unix timestamp (in seconds) when a library sends an event to Data Pipelines.
user_id string
The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.

screen

Screens files contain entries for the screen calls sent to Customer.io. If your integration outputs CSV or parquet files, the context and properties columns contain stringified objects. If your integration outputs JSON files, the context and properties columns contain objects.

anonymous_id string
A unique substitute for a User ID in cases when you don’t have an absolutely unique identifier. Our libraries generate this value automatically to help you track people before they sign up, log in, provide their email, etc.
context
A dictionary of context about a source call/event, like the user’s IP address or locale. Context is automatically collected by our source libraries.
- active boolean
  
  Whether a user is active.
  This is usually used when you send an .identify() call to update the traits independently of when you’ve “last seen” a user.
- channel string
  The channel the event originated from.
  Accepted values:browser,server,mobile
- ip string
  The user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
- locale string
  The locale string for the current user, e.g. en-US.
- userAgent string
  The user agent of the device making the request
- campaign object
  Contains information about the campaign that resulted in the API call, gathered from, or mapping to, UTM parameters (e.g. utm_source).
  - content string
  - medium string
    The type of traffic a person/event originates from, like email, or referral.
  - name string
    The campaign name.
  - source string
    The source of traffic—like the name of your email list, Facebook, Google, etc.
  - term string
    The keyword term(s) a user came from.
  - Additional UTM Parameters* string
- page object
  Contains information about the current page in the browser. This is automatically collected by our JavaScript source.
  - keywords array of [ strings ]
    A list/array of keywords describing the page’s content. The keywords are likely the same as, or similar to, the keywords you would find in an HTML meta tag for SEO purposes. This property is mainly used by content publishers that rely heavily on pageview tracking. This isn’t automatically collected.
  - name string
    The name of the page. Reserved for future use.
  - path string
    The path portion of the page’s URL. Equivalent to the canonical path which defaults to location.pathname from the DOM API.
  - referrer string
    The previous page’s full URL. Equivalent to document.referrer from the DOM API.
  - search string
    The query string portion of the page’s URL. Equivalent to location.search from the DOM API.
  - title string
    The page’s title. Equivalent to document.title from the DOM API.
  - url string
    A page’s full URL. We first look for the canonical URL. If the canonical URL is not provided, we’ll use location.href from the DOM API.
- active boolean
  
  Whether a user is active.
  This is usually used when you send an .identify() call to update the traits independently of when you’ve “last seen” a user.
- channel string
  The channel the event originated from.
  Accepted values:browser,server,mobile
- ip string
  The user’s IP address. This isn’t captured by our libraries, but by our servers when we receive client-side events (like from our JavaScript source).
- locale string
  The locale string for the current user, e.g. en-US.
- userAgent string
  The user agent of the device making the request
- app object
  Contains information about the mobile app the event originated from, automatically collected by our mobile libraries when possible.
  - build string
    The specific build number in the app.
  - name string
    The name of the app.
  - namespace string
    The app’s namespace.
  - version string
    The version of the app the call originated from.
- device object
  Contains information about the device the event originated from.
  - advertisingId string
    The advertising ID is a unique, anonymous ID for advertising.
  - id string
    The device ID.
  - manufacturer string
    The device manufacturer.
  - model string
    The device model.
  - name string
    The device name.
  - type string
    The device type—android, iOS, etc.
    Accepted values:android,ios
  - version string
    The firmware version for the device.
- network object
  Information about the current network connection, containing bluetooth, carrier, cellular, and wifi. If the context.network.cellular and context.network.wifi fields are empty, then the user is offline.
  - bluetooth boolean
    Lets you know if bluetooth is enabled on a device.
  - carrier string
    The cellular carrier the phone uses.
  - cellular boolean
    Indicates whether the device’s cellular connection is enabled or not.
  - wifi boolean
    Indicates whether a device’s wifi connection is enabled or not.
- os object
  Dictionary of information about the operating system, containing name and version.
  - name string
    The operating system running on the device.
  - version string
    The version of the OS running on the device.
id string
A unique identifier for a Data Pipelines event, ensuring that each individual event is unique.
properties object
Additional properties that you sent in your screen event
- Additional event properties* any type
  Properties that you sent in the event. These can take any JSON shape.
received_at integer
The Unix timestamp (in seconds) when Data Pipelines receives an event.
sent_at integer
The Unix timestamp (in seconds) when a library sends an event to Data Pipelines.
user_id string
The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.

alias

The Alias schema contains entries for the alias calls you send to Customer.io. It shows information about the users you merge, with each entry showing a user’s new user_id and their previous_id.

id string
A unique identifier for a Data Pipelines event, ensuring that each individual event is unique.
previous_id string
The userId that you want to merge into the canonical profile.
received_at integer
The Unix timestamp (in seconds) when Data Pipelines receives an event.
sent_at integer
The Unix timestamp (in seconds) when a library sends an event to Data Pipelines.
user_id string
The unique identifier for a person. This value should be unique across systems, so you recognize the same person in your sources and destinations.

Copied to clipboard!

Latest features at Customer.io

Manage access to your workspace with custom roles

Throttle webhook actions with random delays

Add notes to workflow items

Microsoft Azure Blob Storage (Advanced)

About this integration

How it works

Sync frequency and file names

Getting started

Schemas

identify

group

track

page

screen

alias

Latest features at Customer.io

Manage access to your workspace with custom roles

Throttle webhook actions with random delays

Add notes to workflow items

Microsoft Azure Blob Storage (Advanced)

About this integration

How it works

Sync frequency and file names

Getting started

Schemas

identify

group

track

page

screen

alias

How can we make it better?