Table of Contents
Fetching ...

Generalizable autoregressive modeling of time series through functional narratives

Ran Liu, Wenrui Ma, Ellen Zippi, Hadi Pouransari, Jingyun Xiao, Chris Sandino, Behrooz Mahasseni, Juri Minxha, Erdrin Azemi, Eva L. Dyer, Ali Moin

TL;DR

This work builds an alternative sequence of time series by constructing degradation operators of different intensity in the functional space, creating augmented variants of the original sample that are abstracted or simplified to different degrees, leading to a 26\% performance improvement in synthetic feature regression experiments.

Abstract

Time series data are inherently functions of time, yet current transformers often learn time series by modeling them as mere concatenations of time periods, overlooking their functional properties. In this work, we propose a novel objective for transformers that learn time series by re-interpreting them as temporal functions. We build an alternative sequence of time series by constructing degradation operators of different intensity in the functional space, creating augmented variants of the original sample that are abstracted or simplified to different degrees. Based on the new set of generated sequence, we train an autoregressive transformer that progressively recovers the original sample from the most simplified variant. Analogous to the next word prediction task in languages that learns narratives by connecting different words, our autoregressive transformer aims to learn the Narratives of Time Series (NoTS) by connecting different functions in time. Theoretically, we justify the construction of the alternative sequence through its advantages in approximating functions. When learning time series data with transformers, constructing sequences of temporal functions allows for a broader class of approximable functions (e.g., differentiation) compared to sequences of time periods, leading to a 26\% performance improvement in synthetic feature regression experiments. Experimentally, we validate NoTS in 3 different tasks across 22 real-world datasets, where we show that NoTS significantly outperforms other pre-training methods by up to 6\%. Additionally, combining NoTS on top of existing transformer architectures can consistently boost the performance. Our results demonstrate the potential of NoTS as a general-purpose dynamic learner, offering a viable alternative for developing foundation models for time series analysis.

Generalizable autoregressive modeling of time series through functional narratives

TL;DR

This work builds an alternative sequence of time series by constructing degradation operators of different intensity in the functional space, creating augmented variants of the original sample that are abstracted or simplified to different degrees, leading to a 26\% performance improvement in synthetic feature regression experiments.

Abstract

Time series data are inherently functions of time, yet current transformers often learn time series by modeling them as mere concatenations of time periods, overlooking their functional properties. In this work, we propose a novel objective for transformers that learn time series by re-interpreting them as temporal functions. We build an alternative sequence of time series by constructing degradation operators of different intensity in the functional space, creating augmented variants of the original sample that are abstracted or simplified to different degrees. Based on the new set of generated sequence, we train an autoregressive transformer that progressively recovers the original sample from the most simplified variant. Analogous to the next word prediction task in languages that learns narratives by connecting different words, our autoregressive transformer aims to learn the Narratives of Time Series (NoTS) by connecting different functions in time. Theoretically, we justify the construction of the alternative sequence through its advantages in approximating functions. When learning time series data with transformers, constructing sequences of temporal functions allows for a broader class of approximable functions (e.g., differentiation) compared to sequences of time periods, leading to a 26\% performance improvement in synthetic feature regression experiments. Experimentally, we validate NoTS in 3 different tasks across 22 real-world datasets, where we show that NoTS significantly outperforms other pre-training methods by up to 6\%. Additionally, combining NoTS on top of existing transformer architectures can consistently boost the performance. Our results demonstrate the potential of NoTS as a general-purpose dynamic learner, offering a viable alternative for developing foundation models for time series analysis.

Paper Structure

This paper contains 61 sections, 16 equations, 5 figures, 10 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview. (A) Given a sample of time series, one can build different sequences from the original sample by treating it as either concatenation of time periods, or composition of temporal functions. (B) In the former case, it is common to emulate the next word prediction task in language to predict the next time period with an autoregressive (AR) transformer. (C) Alternatively, by applying degradation operators of varying intensity, we can craft augmented variants of samples that are progressively simplified, allowing a next-function prediction task. The AR transformer is trained on the alternative sequence to learn the relationship across the sequence of functions to gradually recover the variance within original samples.
  • Figure 2: Narrative of Time Series (NoTS). (A) To perform autoregressive pre-training of NoTS, we first generate a sequence of time series from the raw signal that progressively simplifies the sample. The generated signals are passed into an encoder, added with position and resolution embeddings before fed into the AR transformer, which is trained with a decoder to reconstruct the signal of the next resolution. The raw signal was passed into a latent consistency loss directly. (B) To apply a pre-trained model on real-world dataset, we construct channel adaptor and task adaptor that handles unseen channel graphs and new tasks, respectively. The channel adaptor consists of a multi layer perceptron that pre-process channel maps and new additive channel embeddings. The task adaptor is newly initialized tokens that are prompted into the transformer following jia2022visual. The produced task tokens and reconstructed samples are later used in multitask applications through this context-aware adaptation pipeline.
  • Figure 3: Visualizations of AR performance and loss. (A) We visualize the autoregressive inference process of NoTS on the synthetic dataset. From bottom to top, the signal variance is gradually recovered through the prediction of the AR transformer. (B) The token space is visualized through principal component analysis, where tokens of the simplified signals gradually disperse to a larger region when colored in different degradation degrees. When colored with relative group positions, the distribution does not shift as much on the direction of another principal component. (C) A pilot study shows that training larger NoTS models leads to lower reconstruction loss on the test set, potentially following the power law behaviour of AR models.
  • Figure 4: Additional data space visualizations.
  • Figure 5: Additional token space visualizations.