ContiFormer: Continuous-Time Transformer for Irregular Time Series Modeling

Yuqi Chen; Kan Ren; Yansen Wang; Yuchen Fang; Weiwei Sun; Dongsheng Li

ContiFormer: Continuous-Time Transformer for Irregular Time Series Modeling

Yuqi Chen, Kan Ren, Yansen Wang, Yuchen Fang, Weiwei Sun, Dongsheng Li

TL;DR

ContiFormer introduces a continuous-time Transformer framework that models irregular time series by integrating Neural ODE-like dynamics into attention. By defining latent trajectories per observation and a continuous-time attention mechanism, it captures evolving input-output relationships while enabling parallel computation. The authors prove a universal approximation property showing ContiFormer encompasses vanilla Transformer variants and common irregular-time attention schemes. Empirical results across interpolation, classification, event sequence prediction, and regular forecasting show state-of-the-art or competitive performance, with a trade-off of higher computational cost due to continuous-time processing. This work advances flexible, high-fidelity modeling of continuous-time dynamics in irregular time series, with potential broad impact on domains with asynchronous data.

Abstract

Modeling continuous-time dynamics on irregular time series is critical to account for data evolution and correlations that occur continuously. Traditional methods including recurrent neural networks or Transformer models leverage inductive bias via powerful neural architectures to capture complex patterns. However, due to their discrete characteristic, they have limitations in generalizing to continuous-time data paradigms. Though neural ordinary differential equations (Neural ODEs) and their variants have shown promising results in dealing with irregular time series, they often fail to capture the intricate correlations within these sequences. It is challenging yet demanding to concurrently model the relationship between input data points and capture the dynamic changes of the continuous-time system. To tackle this problem, we propose ContiFormer that extends the relation modeling of vanilla Transformer to the continuous-time domain, which explicitly incorporates the modeling abilities of continuous dynamics of Neural ODEs with the attention mechanism of Transformers. We mathematically characterize the expressive power of ContiFormer and illustrate that, by curated designs of function hypothesis, many Transformer variants specialized in irregular time series modeling can be covered as a special case of ContiFormer. A wide range of experiments on both synthetic and real-world datasets have illustrated the superior modeling capacities and prediction performance of ContiFormer on irregular time series data. The project link is https://seqml.github.io/contiformer/.

ContiFormer: Continuous-Time Transformer for Irregular Time Series Modeling

TL;DR

Abstract

Paper Structure (74 sections, 4 theorems, 58 equations, 6 figures, 22 tables, 1 algorithm)

This paper contains 74 sections, 4 theorems, 58 equations, 6 figures, 22 tables, 1 algorithm.

Introduction
Related Work
Time-Discretized Models.
Continuous-Time Models.
Method
Continuous-Time Attention Mechanism
Continuous Dynamics from Observations
Query Function
Scaled Dot Product
Expected Values
Multi-Head Attention
Continuous-Time Transformer
Sampling Process
Complexity Analysis
Representation Power of ContiFormer
...and 59 more sections

Key Result

Theorem 1

Given query ($Q$) and key ($K$) matrices, such that $\|Q_i\|_2 < \infty, \|Q_i\|_0 = d$ for $i \in [1, ..., N]$. For certain attention matrix, i.e., $\operatorname{Attn}(Q, K) \in \mathbb{R}^{N \times N}$ (see Appendix sec:variants for more information), there always exists a family of continuously satisfies that $\operatorname{\widetilde{Attn}}(Q, K) = \operatorname{Attn}(Q, K)$.

Figures (6)

Figure 1: Architecture of the ContiFormer layer. ContiFormer takes an irregular time series and its corresponding sampled time points as input. Queries, keys, and values are obtained in continuous-time form. The attention mechanism (CT-MHA) performs a scaled inner product in a continuous-time manner to capture the evolving relationship between observations, resulting in a complex continuous dynamic system. Feedforward and layer normalization are adopted, similar to the Transformer. Finally, a sampling trick is employed to make ContiFormer stackable. Note that the highlighted trajectories in purple indicate the part of functions that are involved in the calculation of the output.
Figure 2: Interpolation and extrapolation of spirals with irregularly-samples time points by Transformer, Neural ODE, and our model.
Figure 3: Visualization of attention scores on UWaveGestureLibrary dataset. Colors indicate the attention scores for different instances at time $t=0$. Observations at time $t=0$ are observed and normalize the time interval to $[0, 1]$.
Figure 4: More visualization results with $\alpha=0.02$. Here, Latent ODE refers to Latent ODE w/ RNN Encoder chen2018neural.
Figure 5: Interpolation results on 2D spiral under different dropout rates.
...and 1 more figures

Theorems & Definitions (7)

Theorem 1: Universal Attention Approximation Theorem
Lemma 1: Existence of Continuously Differentiable Vector Function
proof
Lemma 2: Existence of $\boldsymbol{k}_i(t)$
proof
Theorem 2: Universal Attention Approximation Theorem
proof

ContiFormer: Continuous-Time Transformer for Irregular Time Series Modeling

TL;DR

Abstract

ContiFormer: Continuous-Time Transformer for Irregular Time Series Modeling

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (7)