Providing User Data
Supporting Products
- User-level LTV projections
- Intelligent Automation
- Probabilistic Attribution
- Organic Lift
- Intelligent Budget
Introduction
We require data on Attribution, Revenue, and User Engagement in a standardized format that is described below. This data is aggregated and delivered daily, with each daily delivery landing at a unique location for that day's data. Instructions on delivering this data are likewise detailed below.
Note: Please exclude test or QA user data from the files in order to ensure accuracy and optimal results. You can also provide a list of users to blacklist as well.
Data History
AlgoLift provides the most accurate LTV model when we have 2 years+ of data. This enables us to observe temporal changes in revenue and user behavior and to fully understand how cohorts mature.
We can train our LTV model with less historical data but this restricts the projection windows based on the amount of data available to train our models. Below is a schedule for how AlgoLift releases new projection windows to clients based on the historical data available for modeling:
Days of Historical Data | Projection Window |
---|---|
45 | D30 |
70 | D60 |
95 | D90 / D180 |
105 | D365 |
Data Shape
Attribution Data
Attributions, which are closely tied to installs (which represent the beginning of a user's lifetime within the app or product), are the events when a given user can be attributed to a specific ad in a campaign if they were acquired through advertising, or otherwise can be labeled 'organic.' When preparing Attribution data for export to AlgoLift, please format it according to the below schema.
All columns present in the schema should be present in the corresponding file. Any columns for which data is unavailable should be left blank, but not omitted.
Field | Data Type | Required | Description |
---|---|---|---|
user_id* | varchar(255) | Yes | A common user identifier used across all data tables |
impression_dt | datetime | No | The timestamp of the ad impression event the user is attributed to (ISO-8601 compliant datetime string) |
install_dt | datetime | Yes | The timestamp of the user's install event (ISO-8601 compliant datetime string) |
country | varchar(2) | Yes | The country in which the user resides (ISO-3166 alpha-2 country code) |
platform | varchar(255) | Yes | The platform of the user's device |
source | varchar(255) | Yes | The ad network to which the user has been attributed. Should be Unattributed Facebook for Facebook VTA users who are not attributed to any specific campaign. |
subpub_id | varchar(255) | No** | A unique anonymous identifier for a sub-publisher provided by your MML, i.e. ad network’s ID for the publisher app |
subpub_bundle_id | varchar(255) | No** | The app store bundle id (com.app.pub, or 1234534534), i.e. google or apples’d for the app |
subpub_name | varchar(255) | No** | The proper name for a sub-publisher (i.e., Bubbles Game) |
campaign_id | varchar(255) | No** | The id of the campaign to which the user has been attributed |
campaign_name | varchar(255) | No** | The name of the campaign to which the user has been attributed |
adgroup_id | varchar(255) | No** | The id of the ad group to which the user has been attributed |
adgroup_name | varchar(255) | No** | The name of the ad group to which the user has been attributed |
ad_id | varchar(255) | No** | The id of the ad to which the user has been attributed |
ad_name | varchar(255) | No** | The name of the ad to which the user has been attributed |
ad_type | varchar(255) | No | The type of creative user has been attributed valid types: text - an ad unit containing only text, e.g. a search result banner - a basic format that appears at the top or bottom of the device screen interstitial - a full-page ad that appears during breaks in the current experience video - a standard video, i.e. non-rewarded rewarded_video - an ad unit offering in-app rewards in exchange for watching a video playable - an ad unit containing an interactive preview of the app experience sponsored_content - a link included in a piece of sponsored content, like an advertorial article audio - an audio ad |
keyword_id | varchar(255) | No** | The id of the keyword to which the user has been attributed |
keyword_name | varchar(255) | No** | The name of the keyword to which the user has been attributed |
device_brand | varchar(255) | No | The brand of the user's device |
device_model | varchar(255) | No | The model of the user's device |
os_version | varchar(255) | No | The version of the operating system of the user's device |
custom1 | varchar(255) | No | An arbitrary, client-defined, field. Will be carried through the pipeline as a dimension and may be configured as a filter in the AlgoLift app. |
custom2 | varchar(255) | No | An arbitrary, client-defined, field. Will be carried through the pipeline as a dimension and may be configured as a filter in the AlgoLift app. |
custom3 | varchar(255) | No | An arbitrary, client-defined, field. Will be carried through the pipeline as a dimension and may be configured as a filter in the AlgoLift app. |
conversion_value | int | No | The SKAdNetwork conversion value set on iOS in versions 14 and above. This should be the last conversion value that was set. |
** data required when using Intelligent Automation
*user_id is a non-PII (ie not IDFA/GAID) identifier that uniquely identifies a given user of the app or product. This can be an identifier generated by the client, or a third party identifier provided by a third party such as an MMP or Analytics provider. Any identifier that follows the following rules can be used:
- The identifier is available at the time of attribution.
- All subsequent revenue and engagement events for a given user are provided to AlgoLift using the same identifier as that initially provided in that user’s attribution event. Stated another way, all revenue and engagement events should contain a user_id that has previously been included in an attribution event.
- The identifier is the one AlgoLift will return in any output data.
Not all ad networks use the same nomenclature. The below table shows the names under which you can find certain fields for a given ad network.
AlgoLift Apple Search Ads Google Ads Campaign Campaign Campaign Campaign Ad Group Ad Set Ad Group Ad Group Ad Ad Creative Set Ad Keyword N/A Keyword N/A
Revenue Data
Revenue represents revenue generated from the user, either from in-app purchases, ongoing-subscriptions, or payments derived from ads presented to the user. When formatting Revenue data for export to AlgoLift, please format it according to the following schema:
Field | Data Type | Required | Description |
---|---|---|---|
user_id | varchar(255) | Yes | A common user identifier used across all data tables |
revenue_type | varchar(255) | Yes | The type of the revenue source valid types: [adrev, iap, subscription] |
revenue | double | Yes | Aggregate revenue amount in USD |
transaction_count | int | Yes | Number of transactions |
parcel_name | varchar(255) | Yes* | Unique identifier of the transaction. For iap this should be the item(s) sold, for adrev this should be the ad_type of the ad viewed (banner, rewarded_video, etc). |
* Only for revenue_types iap
or subscription
User Engagement Data
User Engagements represent user actions within the app that meaningfully distinguish a user's relationship with the app from other users. Many of these actions are common across many or all apps, such as starting a session or utilizing a social sharing feature. Others may be highly specific to your app alone. You are encouraged to include whatever actions you feel are most important; if it is unclear to you which actions may be most useful, please contact us and we can help consult on the question. When formatting User Engagement data for export to AlgoLift, please format it according to the following schema:
Field | Data Type | Required | Description |
---|---|---|---|
user_id | varchar(255) | Yes | A common user identifier used across all data tables |
engagement_type | varchar(255) | Yes | The type of the engagement event |
engagement_count | int | Yes | The number of occurrences of the engagement event |
Sample Data
Please click here for a copy of sample data (user attribution, revenue, and engagement) for your reference.
Data Delivery
The volume of the above described data can be significant. Having a reliable way to regularly deliver data at scale is imperative for the functioning of AlgoLift's prediction and optimization products. We rely on reading and writing to Amazon S3 in order to transfer data to and from customers. The instructions for how to do so are detailed below, along with technical formatting information.
Data Formatting
- All data should be formatted in one or more CSV files. These files can be named as you like with data type and date (for example, 'engagement_20201111' or something similar), but should have the
.csv
extension and be placed in the appropriate S3 bucket and prefix for the app, data type, and date they represent. - The first row of each CSV file should be a header row that contains the column names for the ensuing rows.
- All field entries should be double quoted according to the CSV spec.
- Text fields without data should be sent as an empty field, not as the string
NULL
- Required numeric fields should never be
NULL
. If there is no data available, pass0
as the value for the field. Other numeric fields can be passed asNULL
(empty fields, as before). - Data should be deduplicated prior to transfer.
- Aggregating the data by day prior to upload will reduce data size and therefore transfer times and resulting storage space.
Configuring S3 Access
When you begin your integration with AlgoLift, we will create an Amazon S3 bucket specifically for housing your data. The convention we use for bucket naming is algolift-<client>
. Sending us data is as simple as writing data to the appropriate location in the S3 bucket.
Since many companies already use AWS for at least some portion of their data infrastructure, the following process assumes you have an existing account and provides you with the best control and security, since it does not involve sharing any credentials:
- Create a Role in your AWS account that you will be using to write to the bucket. Attach a policy to grant that role permission to write to the new bucket. (see steps 2 & 3 in this AWS support document for more detail.)
- Send a email to
support@algolift.atlassian.net
with the AWS ARN for the role created in step 1. An ARN looks like the following:arn:aws:iam::account-id:role/role-name
. - We will grant your role permissions to the bucket and notify you that the access has been granted.
- You may begin writing data to your bucket.
In the case you do not have an AWS account, we will create a role for you and provide an access key and secret which your data processes may use to gain read/write access to your bucket. These alternative directions will apply.
- Send a message to
support@algolift.atlassian.net
letting us know you need us to provide access and you do not have an AWS account.- We will create the user and grant the appropriate bucket permissions, then share an access key and secret with you via a secure channel.
- You may begin writing data to your bucket. (see this AWS support document for a simple Python example).
Object Location
Within an organization's s3 bucket, data should be written to the following locations for each app:
- Attribution:
s3://algolift-<client>/<app>/ingestion/daily_data/attribution/date=<date>/
- Revenue:
s3://algolift-<client>/<app>/ingestion/daily_data/revenue/date=<date>/
- Engagement:
s3://algolift-<client>/<app>/ingestion/daily_data/user_engagement/date=<date>/
Data should be delivered daily as soon as possible after 00:00:00 UTC. All dates should be in the format of
YYYY-MM-DD
. For more information about folders in S3, see this AWS support document.
Object Nesting
Please note that app
is sufficient for the app parameter in the paths above.
Object Permissions
Important note: When writing, the make sure to set the bucket-owner-full-control
permission on each object that is written (usually this is just a setting in the write command for data processing code used to deploy data). Here's an example of correct object permissions with the AWS CLI: aws s3 cp s3://source_awsexamplebucket/myobject s3://algolift-clientname/path --acl bucket-owner-full-control
Restating Data
There are cases in which data for a past attribution date may need to be restated. Delayed information from an ad network may update your understanding of how a given user is attributed to an ad campaign, the conversion value for a given user may be updated based on actions they take in the first few days of activity, or another reason may apply. AlgoLIFT's ingestion process allows for an "ingestion window" where the last X days of data will be reprocessed each day, so any restated data for past days within the window will be taken into account. To enable this, simply overwrite the outdated data in S3 with the updated data for a given attribution date. The exact width of the ingestion window is variable and can be set by emailing integrations@algolift.com with your desired window size.