Table of Contents
Fetching ...

TTF: A Trapezoidal Temporal Fusion Framework for LTV Forecasting in Douyin

Yibing Wan, Zhengxiong Guan, Chaoli Zhang, Xiaoyang Li, Lai Xu, Beibei Jia, Zhenzhe Zheng, Fan Wu

TL;DR

The paper tackles early-stage channel-level LTV forecasting in paid user acquisition by addressing unaligned multi-time series, SILO constraints, and non-stationary volatility. It introduces Trapezoidal Temporal Fusion (TTF), combining a trapezoidal multi-time series module with MT-FusionNet and a utilitarian loss to robustly fuse heterogeneous, irregular data and predict long-horizon LTV curves. Empirical results on Douyin data show consistent improvements over baselines, with MT-FusionNet delivering lower MAPE on both point-wise curves and cumulative LTV, and the trapezoidal input enabling effective leveraging of longer histories. The framework is deployed in production, yielding tangible business gains (MAPE reductions of 4.3% and 3.2% for point-wise and aggregated LTV, respectively) and demonstrating practical value for scalable, real-world LTV forecasting in marketing analytics.

Abstract

In the user growth scenario, Internet companies invest heavily in paid acquisition channels to acquire new users. But sustainable growth depends on acquired users' generating lifetime value (LTV) exceeding customer acquisition cost (CAC). In order to maximize LTV/CAC ratio, it is crucial to predict channel-level LTV in an early stage for further optimization of budget allocation. The LTV forecasting problem is significantly different from traditional time series forecasting problems, and there are three main challenges. Firstly, it is an unaligned multi-time series forecasting problem that each channel has a number of LTV series of different activation dates. Secondly, to predict in the early stage, it faces the imbalanced short-input long-output (SILO) challenge. Moreover, compared with the commonly used time series datasets, the real LTV series are volatile and non-stationary, with more frequent fluctuations and higher variance. In this work, we propose a novel framework called Trapezoidal Temporal Fusion (TTF) to address the above challenges. We introduce a trapezoidal multi-time series module to deal with data unalignment and SILO challenges, and output accurate predictions with a multi-tower structure called MT-FusionNet. The framework has been deployed to the online system for Douyin. Compared to the previously deployed online model, MAPEp decreased by 4.3%, and MAPEa decreased by 3.2%, where MAPEp denotes the point-wise MAPE of the LTV curve and MAPEa denotes the MAPE of the aggregated LTV.

TTF: A Trapezoidal Temporal Fusion Framework for LTV Forecasting in Douyin

TL;DR

The paper tackles early-stage channel-level LTV forecasting in paid user acquisition by addressing unaligned multi-time series, SILO constraints, and non-stationary volatility. It introduces Trapezoidal Temporal Fusion (TTF), combining a trapezoidal multi-time series module with MT-FusionNet and a utilitarian loss to robustly fuse heterogeneous, irregular data and predict long-horizon LTV curves. Empirical results on Douyin data show consistent improvements over baselines, with MT-FusionNet delivering lower MAPE on both point-wise curves and cumulative LTV, and the trapezoidal input enabling effective leveraging of longer histories. The framework is deployed in production, yielding tangible business gains (MAPE reductions of 4.3% and 3.2% for point-wise and aggregated LTV, respectively) and demonstrating practical value for scalable, real-world LTV forecasting in marketing analytics.

Abstract

In the user growth scenario, Internet companies invest heavily in paid acquisition channels to acquire new users. But sustainable growth depends on acquired users' generating lifetime value (LTV) exceeding customer acquisition cost (CAC). In order to maximize LTV/CAC ratio, it is crucial to predict channel-level LTV in an early stage for further optimization of budget allocation. The LTV forecasting problem is significantly different from traditional time series forecasting problems, and there are three main challenges. Firstly, it is an unaligned multi-time series forecasting problem that each channel has a number of LTV series of different activation dates. Secondly, to predict in the early stage, it faces the imbalanced short-input long-output (SILO) challenge. Moreover, compared with the commonly used time series datasets, the real LTV series are volatile and non-stationary, with more frequent fluctuations and higher variance. In this work, we propose a novel framework called Trapezoidal Temporal Fusion (TTF) to address the above challenges. We introduce a trapezoidal multi-time series module to deal with data unalignment and SILO challenges, and output accurate predictions with a multi-tower structure called MT-FusionNet. The framework has been deployed to the online system for Douyin. Compared to the previously deployed online model, MAPEp decreased by 4.3%, and MAPEa decreased by 3.2%, where MAPEp denotes the point-wise MAPE of the LTV curve and MAPEa denotes the MAPE of the aggregated LTV.

Paper Structure

This paper contains 22 sections, 14 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Relationship between LTV Curve and LTV_N. LTV Curve describe the value that generated by acquired users through a specific channel on a specific activation date, calculated on each retention day. LTV_N stands for the cumulative sum of LTV Curve over N retention days.
  • Figure 2: Trapezoidal Multi-Time Series Module. This figure shows a trapezoidal input window in a certain channel. There are $k$ series in the window with increasing activation dates and decreasing info lengths. The number of days between two adjacent activation dates are denoted as stride $s$.
  • Figure 3: Overview of MT-FusionNet. The input is first normalized by a robust scale. The normalized input is passed through a moving average module of different scales (tower 0 has a moving average scale = 1) and then fed into independent backbones. All backbones can take the same covariates input. After adding position encoding to the output of each backbone, the outputs of all towers are concatenated and passed to a feed-forward network. The final result is obtained after the inverse robust scale.
  • Figure 4: Prediction results of with different backbones, with and without the MT-FusionNet. Panels (a)–(c) correspond to TSMixer, TiDE, and DLinear, respectively. Blue: with MT-FusionNet; orange: origin version; green: ground truth. The x-axis denotes retention days; the y-axis denotes LTV. Across the full retention-day horizon, the MT-FusionNet variants align more closely with the ground truth than their origin version, which exhibit a systematic positive bias and a larger phase lag around local extrema. The MT-FusionNet design reduces bias, better tracks short-term fluctuations, and preserves the long-term trend. TSMixer with the MT-FusionNet outperforms the other deep learning backbones.
  • Figure 5: Overview of system deployed in a production environment. The system is organized by three subsystems: (i) Offline data processing and model training, responsible for data processing, model training, and model evaluation; (ii) Online serving handles prediction requests by loading versioned datasets and model artifacts, running inference, and performing post‑processing; (iii) Third‑Party system integrates external repositories that manage datasets, model artifacts, and prediction results.

Theorems & Definitions (4)

  • Definition 1: Activation Date
  • Definition 2: Retention Day
  • Definition 3: LTV Curve
  • Definition 4: LTV_N