Table of Contents
Fetching ...

Adapting Time Series Foundation Models through Data Mixtures

Thomas L. Lee, Edoardo M. Ponti, Amos Storkey

TL;DR

MixFT is proposed which re-divides the data using Bayesian mixtures into sets that best represent the sub-domains present in the data, and fine-tunes separately on each of these sets.

Abstract

Time series foundation models (TSFMs) have become increasingly popular for zero-shot forecasting. However, for a new time series domain not fully covered by the pretraining set, performance can suffer. Therefore, when a practitioner cares about a new domain and has access to a set of related datasets, the question arises: how best to fine-tune a TSFM to improve zero-shot forecasting? A typical approach to this type of problem is to fine-tune a LoRA module on all datasets or separately on each dataset. Tuning a separate module on each dataset allows for the specialisation of the TSFM to different types of data distribution, by selecting differing combinations of per-dataset modules for different time series contexts. However, we find that, using per-dataset modules might not be optimal, since a time series dataset can contain data from several types of distributions, i.e. sub-domains. This can be due to the distribution shifting or having differing distributions for different dimensions of the time series. Hence, we propose MixFT which re-divides the data using Bayesian mixtures into sets that best represent the sub-domains present in the data, and fine-tunes separately on each of these sets. This re-division of the data ensures that each set is more homogeneous, leading to fine-tuned modules focused on specific sub-domains. Our experiments show that MixFT performs better than per-dataset methods and when fine-tuning a single module on all the data. This suggests that by re-partitioning the data to represent sub-domains we can better specialise TSFMs to improve zero-shot forecasting.

Adapting Time Series Foundation Models through Data Mixtures

TL;DR

MixFT is proposed which re-divides the data using Bayesian mixtures into sets that best represent the sub-domains present in the data, and fine-tunes separately on each of these sets.

Abstract

Time series foundation models (TSFMs) have become increasingly popular for zero-shot forecasting. However, for a new time series domain not fully covered by the pretraining set, performance can suffer. Therefore, when a practitioner cares about a new domain and has access to a set of related datasets, the question arises: how best to fine-tune a TSFM to improve zero-shot forecasting? A typical approach to this type of problem is to fine-tune a LoRA module on all datasets or separately on each dataset. Tuning a separate module on each dataset allows for the specialisation of the TSFM to different types of data distribution, by selecting differing combinations of per-dataset modules for different time series contexts. However, we find that, using per-dataset modules might not be optimal, since a time series dataset can contain data from several types of distributions, i.e. sub-domains. This can be due to the distribution shifting or having differing distributions for different dimensions of the time series. Hence, we propose MixFT which re-divides the data using Bayesian mixtures into sets that best represent the sub-domains present in the data, and fine-tunes separately on each of these sets. This re-division of the data ensures that each set is more homogeneous, leading to fine-tuned modules focused on specific sub-domains. Our experiments show that MixFT performs better than per-dataset methods and when fine-tuning a single module on all the data. This suggests that by re-partitioning the data to represent sub-domains we can better specialise TSFMs to improve zero-shot forecasting.
Paper Structure (27 sections, 8 equations, 13 figures, 7 tables, 2 algorithms)

This paper contains 27 sections, 8 equations, 13 figures, 7 tables, 2 algorithms.

Figures (13)

  • Figure 1: MixFT identifies and trains separate LoRA modules on different sub-domains given in the fine-tuning datasets. This is unlike previous approaches which either separately fine-tune LoRA modules using dataset boundaries (per-dataset methods) or trained a single LoRA module on all the data (Shared) ostapenko2024towards. By training separate LoRA modules on sub-domains, MixFT aims for the training data for each LoRA to be more homogeneous, leading to more specialized and consistent LoRA modules. This also should lead to better identifiability of what LoRA modules to use when zero-shot forecasting.
  • Figure 2: Overview of MixFT. On the left side of the figure, we display how MixFT fine-tunes LoRA modules for zero-shot forecasting. First, it identifies the sub-domains in the fine-tuning datasets. Then it redivides the data per-sub-domain. Last, it trains a separate LoRA module on the data of each sub-domain. This ensures that the trained LoRA modules specialise in forecasting sub-domains not datasets, unlike previous work ostapenko2024towards. On the right side of the figure, we show how MixFT constructs a forecast. This is done by identifying what sub-domain a context belongs to and then using that sub-domains LoRA module for forecasting. This ensures the TSFM is exploiting the knowledge of that sub-domain given by the fine-tuning data.
  • Figure 3: Mixture membership of the fine-tuning datasets for MixFT. The plot shows, for the first channel of each fine-tuning dataset, when the time series is learnt to be of the first mixture component (purple area) or the second component (yellow area). We find that some of the channels consist of just one mixture component (CloudD3 and M4-Weekly), mostly one component (BitBrains and M4-Hourly) or exhibit periodic patterns (CloudD4 and BizITObs-Service). The plots suggest that MixFTs Bayesian mixture model finds reasonable patterns in the data. Also, as both sub-domains/mixture components are found in a given dataset, MixFT's data divisions can not be found by per-dataset methods.
  • Figure 7: Mixture membership of the evaluation datasets for MixFT, when using Chronos Bolt. The plot shows, for the first channel of each evaluation dataset, when the time series is learnt to be of the first mixture component (purple area) or the second component (yellow area). We find, as expected, that there are less consistent patterns of mixture membership than in the fine-tuning data (see Figure \ref{['fig:mixDist']}). This demonstrates the difficultly of identifying sub-domains from zero-shot data. However, there are still patterns of usage. For example, for BizITObs-Application we see a periodic pattern. This suggests while it is hard to identify zero-shot mixture membership, MixFT still does a reasonable job and this is one of the reasons it performs well in our experiments.
  • Figure : a)
  • ...and 8 more figures