Data Delivery
Data Delivery
We use Amazon S3 for data transfers to and from clients. We will grant access to the same AWS role that you use to supply AlgoLift with input data.
Raw Data
A raw output of the user-level projections will be available daily for your processes to ingest and use in any other systems. The data includes a row for each User ID with some basic dimensions and revenue projections on each projection window (d30, d60, etc). These projections show the projected revenue for users younger than the window, and the actual revenue for older users. See below for a comprehensive description of the output schema.
We will create the S3 bucket to hold this data and provision access for your systems (please make sure you've provided your AWS IAM profile to us to ensure access). It is partitioned by projection date (same as output date) and may be split into multiple files based on size. We encode output as compressed CSV for maximum compatibility with downstream systems.
The key structure of our output looks like the following:
s3://algolift-<client>/<app>/daily_output/user_forecast_output/2001-01-01/user_forecast_output_0000.csv.gz
We take advantages of parallel computing to run analysis on large datasets. In some cases, especially with larger datasets that exceed the compute capacity of a single processing node, this leads to multiple output files (one from each node used to carry out the work). In such a case, each file should be considered a portion of the output, and all files taken together to be the complete output.
LTV Output Schema
Field | Description |
---|---|
user_id | Unique user identifier |
platform | The platform of the user's device |
country_name | The country in which the user resides (full name) |
country_code | The country in which the user resides (ISO-3166 alpha-2 country code) |
source | The ad network to which the user has been attributed |
install_date | Date of the user's install event (ISO-8601 datetime string) |
revenue_at_projection | Revenue attributed to the user up to the time of projection |
pltv_30 | Projected LTV (gross revenue) for horizon of 30 days. This value will represent the actual D30 LTV once the user is 30+ days past install date |
pltv_60 | Projected LTV (gross revenue) for horizon of 60 days. This value will represent the actual D60 LTV once the user is 60+ days past install date |
pltv_90 | Projected LTV (gross revenue) for horizon of 90 days. This value will represent the actual D90 LTV once the user is 90+ days past install date |
pltv_180 | Projected LTV (gross revenue) for horizon of 180 days. This value will represent the actual D180 LTV once the user is 180+ days past install date |
pltv_365 | Projected LTV (gross revenue) for horizon of 365 days. This value will represent the actual D365 LTV once the user is 365+ days past install date |
revenue_d1 | Revenue attributed to the user within the first day after install |
revenue_d7 | Revenue attributed to the user within the first 7 days after install. Revenue-to-date if user installed <7 days ago. |
has_purchased | 1 if user has at least one attributed purchase, otherwise 0 |
age | Days since install. Will be 0 when projection_date = install_date |
conversion_probability | Probability of conversion by D30. Will be NULL for users past D30 |
churn_probability | Probability of churn. Definition is custom to client |
projection_date | The most recent date of available data when this projection was made (This will match the install date of the most recent cohort.) |