Perceiver-based CDF Modeling for Time Series Forecasting
Cat P. Le, Chris Cannella, Ali Hasan, Yuting Ng, Vahid Tarokh
TL;DR
This work introduces perceiver-CDF, a scalable framework for multimodal time-series forecasting that jointly handles missing data and irregular sampling. By embedding a perceiver-based encoder to compress high-dimensional inputs into a latent space and coupling it with a copula-based decoder, the model captures conditional and joint distributions efficiently, achieving sub-quadratic complexity. The addition of midpoint inference for local attention and an output-variance testing mechanism mitigates error propagation, yielding robust predictions. Across unimodal and multimodal benchmarks, perceiver-CDF delivers about 20% superior performance to state-of-the-art methods while using less than half the computational resources, highlighting its practical potential for large-scale, heterogeneous time-series forecasting.
Abstract
Transformers have demonstrated remarkable efficacy in forecasting time series data. However, their extensive dependence on self-attention mechanisms demands significant computational resources, thereby limiting their practical applicability across diverse tasks, especially in multimodal problems. In this work, we propose a new architecture, called perceiver-CDF, for modeling cumulative distribution functions (CDF) of time series data. Our approach combines the perceiver architecture with a copula-based attention mechanism tailored for multimodal time series prediction. By leveraging the perceiver, our model efficiently transforms high-dimensional and multimodal data into a compact latent space, thereby significantly reducing computational demands. Subsequently, we implement a copula-based attention mechanism to construct the joint distribution of missing data for prediction. Further, we propose an output variance testing mechanism to effectively mitigate error propagation during prediction. To enhance efficiency and reduce complexity, we introduce midpoint inference for the local attention mechanism. This enables the model to efficiently capture dependencies within nearby imputed samples without considering all previous samples. The experiments on the unimodal and multimodal benchmarks consistently demonstrate a 20% improvement over state-of-the-art methods while utilizing less than half of the computational resources.
