Table of Contents
Fetching ...

In-Context Fine-Tuning for Time-Series Foundation Models

Abhimanyu Das, Matthew Faw, Rajat Sen, Yichen Zhou

TL;DR

This work designs a pretrained foundation model that can be prompted with multiple time-series examples, in order to forecast a target time-series into the future, and shows that such a foundation model that uses in-context examples at inference time can obtain much better performance on popular forecasting benchmarks compared to supervised deep learning methods, statistical models, as well as other time-series foundation models.

Abstract

Motivated by the recent success of time-series foundation models for zero-shot forecasting, we present a methodology for $\textit{in-context fine-tuning}$ of a time-series foundation model. In particular, we design a pretrained foundation model that can be prompted (at inference time) with multiple time-series examples, in order to forecast a target time-series into the future. Our foundation model is specifically trained to utilize examples from multiple related time-series in its context window (in addition to the history of the target time-series) to help it adapt to the specific distribution of the target domain at inference time. We show that such a foundation model that uses in-context examples at inference time can obtain much better performance on popular forecasting benchmarks compared to supervised deep learning methods, statistical models, as well as other time-series foundation models. Interestingly, our in-context fine-tuning approach even rivals the performance of a foundation model that is explicitly fine-tuned on the target domain.

In-Context Fine-Tuning for Time-Series Foundation Models

TL;DR

This work designs a pretrained foundation model that can be prompted with multiple time-series examples, in order to forecast a target time-series into the future, and shows that such a foundation model that uses in-context examples at inference time can obtain much better performance on popular forecasting benchmarks compared to supervised deep learning methods, statistical models, as well as other time-series foundation models.

Abstract

Motivated by the recent success of time-series foundation models for zero-shot forecasting, we present a methodology for of a time-series foundation model. In particular, we design a pretrained foundation model that can be prompted (at inference time) with multiple time-series examples, in order to forecast a target time-series into the future. Our foundation model is specifically trained to utilize examples from multiple related time-series in its context window (in addition to the history of the target time-series) to help it adapt to the specific distribution of the target domain at inference time. We show that such a foundation model that uses in-context examples at inference time can obtain much better performance on popular forecasting benchmarks compared to supervised deep learning methods, statistical models, as well as other time-series foundation models. Interestingly, our in-context fine-tuning approach even rivals the performance of a foundation model that is explicitly fine-tuned on the target domain.

Paper Structure

This paper contains 27 sections, 9 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: Analogous to few-shot prompting of a foundation LLM (left), we train a time-series foundation model to support few-shot prompting with an arbitrary number of related in-context time-series examples (right). The dashed box encloses the full context window/prompt.
  • Figure 2: An example prediction task $(\{\mathbf{y}_{1 : T_{1}}^{(1)},\mathbf{y}_{1 : T_{2}}^{(2)}, \mathbf{y}_{1 : T_{3}}^{(3)}, \mathbf{y}_{1 : L}\}, \mathbf{y}_{L+1 : L+H})$. The three black dashed lines (separators) separate the three in-context examples $\{\mathbf{y}_{1 : T_{i}}^{(i)}\}_{i\in[3]}$ and the history $\mathbf{y}_{1 : L}$. The goal is to predict the horizon $\mathbf{y}_{L+1 : L+H}$ of the history $\mathbf{y}_{1 : L}$.
  • Figure 3: A prediction task with two forms of concatenation: in \ref{['subfig:linearTrendsVS']}, we concatenate with separators, and in \ref{['subfig:triangleWaveVS']}, we concatenate without separators. Concatenating in-context examples together without separators can confuse the model: multiple linear trends look like a triangular wave if concatenated naïvely.
  • Figure 4: Our decoder-only architecture for time-series prediction with in-context examples.
  • Figure 5: In (a), we report the geometric mean of scaled MAE for Monash datasets. We include all official Monash baselines as well as TimesFM-ICF, TimesFM (base). TimesFM (base) yields a 7% improvement over the next best baseline. We also report one standard error similar to das2023decoder. In (b), we report the average MAE numbers for 4 datasets ETTh1, ETTh2, ETTm1 and ETTm2. Similar to prior work like nie2022time, the numbers are reported for rolling validation over the test split which makes up the last 1/5th of time-points in each dataset. We also report one standard error. TimesFM-ICF yields a marked improvement of at least 25% over other baselines.
  • ...and 3 more figures