Snowflake (Advanced)

PremiumThis feature is available for Premium plans. EnterpriseThis feature is available for Enterprise plans. Updated

About this integration

Snowflake is a cloud data platform that provides a data warehouse-as-a-service designed for the cloud. It allows you to unify, integrate, analyze, and share previously siloed data.

How it works

This integration sends CSV, JSON, or parquet files containing your data to your Snowflake (Advanced) bucket. Then you can ingest the files in your storage bucket to your data warehouse of choice.

We write files for each type of incoming call to your storage bucket every 10 minutes. So you’ll have files for identify calls, track calls, and so on. Files are named with an incrementing number, so it’s easy to determine the sequence of files, and the order of incoming calls.

sequenceDiagram participant a as Customer.io participant b as Storage Bucket participant c as Snowflake (Advanced) loop every 10 minutes a->>b: export CSV, JSON, or parquet files b->>c: ingest c->>b: expire/delete files
before next sync end

Sync frequency and file names

Syncs occur every 10 minutes. Each sync file contains data from the previous sync interval. For example, if the last sync occurred at 12:00 PM, the next sync will only send data from 12:00 PM to 12:09:59 PM.

Each sync generates new files for each data type in your storage bucket. Files are named in the format <integration id>.<integration action id>.<current position>.<type>.

  • The integration ID and action ID are unique identifiers generated by Customer.io. You’ll see them with the first sync.
  • current position is an incrementing number beginning at 1 that indicates the order of syncs. So your first sync is 1, the next one is 2, etc.
  • type is the type of incoming call—identify, track, page, screen, alias, or group.

So, if your file is called 2184.13699.1.track.json, it’s the first sync file for the track call type.

Getting started

To support Snowflake (Advanced), you’ll set up a Google Cloud Storage, Amazon S3, or Microsoft Azure Blob Storage bucket to store your data. Then, you’ll query and import data from your storage bucket to Snowflake (Advanced) either through a direct query or a product like Stitch.

As a part of this integration, we’ll create parquet, JSON, or CSV files in your storage bucket. See data warehouses for a list of data schemas.

  1. Go to Data & Integrations > Integrations and select Snowflake (Advanced) in the Directory tab.

  2. Connect to your storage bucket:

  3. Review your setup and click Finish to enable your integration.

Google Cloud Storage (GCS)

  1. Endpoint: Endpoint for the internal ETL API.

  2. Token: Authentication token for the internal ETL API.

  3. Format: Format of the data files that will be created.

  4. Bucket Name: Name of the Google Cloud Storage Bucket where files will be written to. Learn more about GCS buckets and bucket naming rules.

  5. Bucket Path: Optional folder inside the bucket where files will be written to.

  6. Service Account: The JSON string of the Google Cloud Service Account with permissions to upload files to a bucket, which can be found in your Google Cloud Console. Learn more about Google Cloud Service Accounts.

Amazon S3

  1. Endpoint: Endpoint for the internal ETL API.

  2. Token: Authentication token for the internal ETL API.

  3. Format: Format of the data files that will be created.

  4. Bucket Name: Name of an existing bucket. Learn more about S3 buckets and bucket naming rules.

  5. Bucket Path: Optional folder inside the bucket where files will be written to.

  6. Access Key: The AWS Access Key ID that will be used to connect to your S3 Bucket. Your Access Key ID can be found in the My Security Credentials section of your AWS Console. Learn more about AWS credentials.

  7. Secret Key: The AWS Secret Access Key that will be used to connect to your S3 Bucket. Your Secret Access Key can be found in the My Security Credentials section of your AWS Console. Learn more about AWS credentials.

  8. Region: The AWS Region where your S3 Bucket resides in. Learn more about AWS Regions.

Azure Blob Storage

  1. Endpoint: Endpoint for the internal ETL API.

  2. Token: Authentication token for the internal ETL API.

  3. Format: Format of the data files that will be created.

  4. Blob Sas Url: The SAS URL of the Azure Blob Storage container with permissions to upload files to a container. Learn how to generate an Azure SAS URL in our documentation.

  5. Blob Path: Optional folder inside the container where files will be written to.

Schemas

The following schemas represent JSON for the different types of files we export to your storage bucket (identify, track, and so on). For CSV and Parquet files, we stringify objects and arrays. For example, if identify calls contain the traits object with a first_name and last_name, CSV files output to your storage bucket will contain a traits column with data that looks like this for each row: "{ "\first_name\": \"Bugs\", \"last_name\": \"Bunny\" }".

Identifies files contain identify calls sent to Customer.io. The context and traits in the schema below are objects in JSON. In CSV and parquet files, these columns contain stringified objects.

    • createdAt string  (date-time)
      We recommend that you pass date-time values as ISO 8601 date-time strings. We convert this value to fit destinations where appropriate.
    • email string
      A person’s email address. In some cases, you can pass an empty userId and we’ll use this value to identify a person.
    • Additional Traits* any type
      Traits that you want to set on a person. These can take any JSON shape.

Groups files contain group calls sent to Customer.io. If your integration outputs CSV or parquet files, the context and traits columns contain stringified objects.

    • Additional Traits* any type
      Traits can have any name, like `account_name` or `total_employees`. These can take any JSON shape.

Tracks contains entries for the track calls you send to Customer.io. It shows information about the events your users perform.

If your integration outputs CSV or parquet files, the context and properties columns contain stringified objects. If your integration outputs JSON files, the context and properties columns contain objects.

  • event string
    The slug of the event name, mapping to an event-specific table.
  • event_text string
    The name of the event.
    • Event Properties* any type

Pages contains entries for the page calls sent to Customer.io. If your integration outputs CSV or parquet files, the context and properties columns contain stringified objects. If your integration outputs JSON files, the context and properties columns contain objects.

    • category string
      The category of the page. This might be useful if you have a single page routes or have a flattened URL structure.
    • path string
      The path of the page. This defaults to location.pathname, but can be overridden.
    • referrer string
      The referrer of the page, if applicable. This defaults to document.referrer, but can be overridden.
    • search string
      The search query in the URL, if present. This defaults to location.search, but can be overridden.
    • title string
      The title of the page. This defaults to document.title, but can be overridden.
    • url string
      The URL of the page. This defaults to a canonical url if available, and falls back to document.location.href.
    • Page Properties* any type

Screens files contain entries for the screen calls sent to Customer.io. If your integration outputs CSV or parquet files, the context and properties columns contain stringified objects. If your integration outputs JSON files, the context and properties columns contain objects.

    • Additional event properties* any type
      Properties that you sent in the event. These can take any JSON shape.

The Alias schema contains entries for the alias calls you send to Customer.io. It shows information about the users you merge, with each entry showing a user’s new user_id and their previous_id.

    Copied to clipboard!
      Contents