U-Former ODE: Fast Probabilistic Forecasting of Irregular Time Series

Ilya Kuleshov; Alexander Marusov; Alexey Zaytsev

U-Former ODE: Fast Probabilistic Forecasting of Irregular Time Series

Ilya Kuleshov, Alexander Marusov, Alexey Zaytsev

TL;DR

UFO addresses probabilistic forecasting for irregular multivariate time series by integrating a time-parallel Neural CDE backbone with a U‑Net–style hierarchy and Transformer refiners. It introduces neural CDE-based resampling, kernel interpolation with regularization, SwiGLU vector fields, and patch-based regularization to achieve both global context and local temporal sensitivity. The approach delivers state-of-the-art performance on five benchmarks, with up to 15x faster inference than traditional Neural CDEs and robust performance on long-horizon, high-dimensional data. This framework enables scalable, accurate, and uncertainty-aware forecasting in domains with irregular sampling, such as healthcare and finance.

Abstract

Probabilistic forecasting of irregularly sampled time series is crucial in domains such as healthcare and finance, yet it remains a formidable challenge. Existing Neural Controlled Differential Equation (Neural CDE) approaches, while effective at modelling continuous dynamics, suffer from slow, inherently sequential computation, which restricts scalability and limits access to global context. We introduce UFO (U-Former ODE), a novel architecture that seamlessly integrates the parallelizable, multiscale feature extraction of U-Nets, the powerful global modelling of Transformers, and the continuous-time dynamics of Neural CDEs. By constructing a fully causal, parallelizable model, UFO achieves a global receptive field while retaining strong sensitivity to local temporal dynamics. Extensive experiments on five standard benchmarks -- covering both regularly and irregularly sampled time series -- demonstrate that UFO consistently outperforms ten state-of-the-art neural baselines in predictive accuracy. Moreover, UFO delivers up to 15$\times$ faster inference compared to conventional Neural CDEs, with consistently strong performance on long and highly multivariate sequences.

U-Former ODE: Fast Probabilistic Forecasting of Irregular Time Series

TL;DR

Abstract

faster inference compared to conventional Neural CDEs, with consistently strong performance on long and highly multivariate sequences.

Paper Structure (65 sections, 6 theorems, 20 equations, 8 figures, 5 tables)

This paper contains 65 sections, 6 theorems, 20 equations, 8 figures, 5 tables.

Introduction
Problem Statement
Method
Method Structure
Layer types
Encoder
Decoder
Architecture Details
Neural CDE Resampling
Interpolation
Vector Field
Transformer Refining.
Theory
CDE Rescaling Lipschitzness
Patching Regularization
...and 50 more sections

Key Result

theorem 1

The map $t \mapsto \boldsymbol{\Phi}(t \cdot w)$ is Lipschitz continuous with constant i.e., the effective Lipschitz constant per unit of the new (coarser) time.

Figures (8)

Figure 1: Our UFO method combines three leading time series architectures, each bringing its own strengths and compensating for the others' weaknesses (as illustrated by the arrows). Gray ellipses show the existing works, closest to the intersection, however, none match perfectly, see Section \ref{['sec:review']} for details.
Figure 2: The proposed UFO hierarchical architecture.
Figure 3: The Neural CDE Downsampling (left) and Upsampling (right) procedures. Half-transparent domes represent the (un)patching procedure, the wavy arrows represent Neural CDE integration \ref{['eq:ncde_int']}, which corresponds to \ref{['eq:ncde_downsampling']} for downsampling, and to \ref{['eq:ncde_upsampling']} for upsampling.
Figure 4: Inference time for all considered models (log scale), performed for a single sample on 6 batches of 32, and then averaged to obtain the time per single-batch sample. Classic models are in blue, Neural CDE models are in orange. We specify the corresponding time in seconds in a gray rectangle above each bar.
Figure 5: Sensitivity analysis w.r.t. input observation number. Due to limited precision, Deep AR gradients are exactly zero at earlier positions, so we cannot visualize them in log-scale.
...and 3 more figures

Theorems & Definitions (6)

theorem 1
corollary 1
lemma 1: Patching Regularization
Theorem : Informal version, from kuleshov2024denots
Theorem : \ref{['th:traj_lipsch']}
Lemma : \ref{['lemma:irreg']}

U-Former ODE: Fast Probabilistic Forecasting of Irregular Time Series

TL;DR

Abstract

U-Former ODE: Fast Probabilistic Forecasting of Irregular Time Series

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (6)