Table of Contents
Fetching ...

Training and Evaluating Causal Forecasting Models for Time-Series

Thomas Crasson, Yacine Nabet, Mathias Lécuyer

TL;DR

The orthogonal statistical learning framework is extended to train causal time-series models that generalize better when forecasting the effect of actions outside of their training distribution.

Abstract

Deep learning time-series models are often used to make forecasts that inform downstream decisions. Since these decisions can differ from those in the training set, there is an implicit requirement that time-series models will generalize outside of their training distribution. Despite this core requirement, time-series models are typically trained and evaluated on in-distribution predictive tasks. We extend the orthogonal statistical learning framework to train causal time-series models that generalize better when forecasting the effect of actions outside of their training distribution. To evaluate these models, we leverage Regression Discontinuity Designs popular in economics to construct a test set of causal treatment effects.

Training and Evaluating Causal Forecasting Models for Time-Series

TL;DR

The orthogonal statistical learning framework is extended to train causal time-series models that generalize better when forecasting the effect of actions outside of their training distribution.

Abstract

Deep learning time-series models are often used to make forecasts that inform downstream decisions. Since these decisions can differ from those in the training set, there is an implicit requirement that time-series models will generalize outside of their training distribution. Despite this core requirement, time-series models are typically trained and evaluated on in-distribution predictive tasks. We extend the orthogonal statistical learning framework to train causal time-series models that generalize better when forecasting the effect of actions outside of their training distribution. To evaluate these models, we leverage Regression Discontinuity Designs popular in economics to construct a test set of causal treatment effects.

Paper Structure

This paper contains 27 sections, 5 theorems, 28 equations, 7 figures, 4 tables, 2 algorithms.

Key Result

Proposition 1

Under Assumption eq:assumption-uncounfoundedness, we can identify the Conditional Average Treatment Effect (CATE) as follows:

Figures (7)

  • Figure 1: Forecasts from a LightGBM and a TFT model. (a),(b): Daily demand forecast change under price increase for one train. The blue line is the ground truth; the orange line is the model's forecast for observed prices; the green curve is the forecast under a 30% price increase. Both models predict an increase in demand under increased prices. (c): CATE distribution over the test set, for a change from the observed price to the next possible price, normalized by the price change.
  • Figure 2: CATE estimates (distance between the two triangles) at three $t^n_i$ values on a train time series $n$, without weekday correction (a), and with correction (b). In this example, the RDD framework effectively captures the decline in demand under price increases. The fit is better after correction, so CATE estimates are likely more accurate. Prices and demand are scaled to $[0, 1]$.
  • Figure 3: (a) Daily demand forecast change under price increase for one train, causal TFT. Analogous to Fig. \ref{['fig:motivation-train-example-LGBM']}-\ref{['fig:motivation-train-example-TFT']}. (b): CATE distribution over the test set, for a change from the observed price to the next possible price, normalized by the price change. Analogous to Fig. \ref{['fig:motivation-elasticity']}.
  • Figure 4: $\theta_0$ values for various time steps for a railway time serie
  • Figure 5: Example of $\theta$ vector values for the one-hot encoding on the passenger rail dataset.
  • ...and 2 more figures

Theorems & Definitions (8)

  • Proposition 1: CATE Identifyablity
  • Proposition 2: Orthogonal Learning for Binary Treatment Effect
  • Proposition 3: Orthogonal Learning for Categorical Treatment Effects
  • proof
  • Proposition 4: CATE Identifyablity from RDD
  • Proposition 5: RDD for point CATE
  • proof
  • Definition 1: Directional Derivative