Probabilistic Forecasting of Irregular Time Series via Conditional Flows
Vijaya Krishna Yalavarthi, Randolf Scholz, Stefan Born, Lars Schmidt-Thieme
TL;DR
This work tackles probabilistic forecasting for irregularly sampled multivariate time series with missing values by learning conditional joint distributions $p(y\mid x^{\text{obs}}, x^{\text{qry}})$ using a novel normalizing-flow framework. ProFITi combines a sorted invertible triangular attention (SITA) with an invertible activation (Shiesh) inside an invariant conditional normalizing flow, enabling dynamic, permutation-invariant joint densities over variable-length query sets. The encoder (GraFITi) provides rich, equivariant conditioning, and training optimizes the normalized joint negative log-likelihood (njNLL). Empirical results on four real-world datasets show substantial improvements in joint likelihoods over baselines, highlighting ProFITi's ability to capture complex dependencies and non-Gaussian uncertainty in irregular IMTS data. These advancements have practical implications for domains like healthcare and climate science where accurate, uncertainty-aware forecasting of multivariate, irregular signals is crucial.
Abstract
Probabilistic forecasting of irregularly sampled multivariate time series with missing values is an important problem in many fields, including health care, astronomy, and climate. State-of-the-art methods for the task estimate only marginal distributions of observations in single channels and at single timepoints, assuming a fixed-shape parametric distribution. In this work, we propose a novel model, ProFITi, for probabilistic forecasting of irregularly sampled time series with missing values using conditional normalizing flows. The model learns joint distributions over the future values of the time series conditioned on past observations and queried channels and times, without assuming any fixed shape of the underlying distribution. As model components, we introduce a novel invertible triangular attention layer and an invertible non-linear activation function on and onto the whole real line. We conduct extensive experiments on four datasets and demonstrate that the proposed model provides $4$ times higher likelihood over the previously best model.
