AltTS: A Dual-Path Framework with Alternating Optimization for Multivariate Time Series Forecasting
Zhihang Yuan, Zhiyuan Liu, Mahesh K. Marina
TL;DR
ALTTS addresses gradient entanglement in multivariate time series forecasting by decoupling autoregressive (AR) dynamics from cross-dimension (CR) interactions into a dual-path framework. The AR path is a linear per-series predictor with RevIN, while the CR path is a Transformer with Cross-Relation Self-Attention (CRSA) that explicitly models cross-variable dependencies, with diagonal masking to prevent AR leakage. These paths are coordinated through alternating optimization (AO), updating AR and CR parameters in turn with independent optimizers to reduce gradient noise and interference. Empirically, ALTTS achieves competitive to state-of-the-art results across seven LTSF benchmarks, with the largest gains at long horizons, and ablations confirm the critical roles of AR/CR decoupling and AO in stabilizing training and improving accuracy. The work highlights training schedules as a design variable, suggesting optimization-driven architectural choices can drive progress as effectively as more complex models.
Abstract
Multivariate time series forecasting involves two qualitatively distinct factors: (i) stable within-series autoregressive (AR) dynamics, and (ii) intermittent cross-dimension interactions that can become spurious over long horizons. We argue that fitting a single model to capture both effects creates an optimization conflict: the high-variance updates needed for cross-dimension modeling can corrupt the gradients that support autoregression, resulting in brittle training and degraded long-horizon accuracy. To address this, we propose ALTTS, a dual-path framework that explicitly decouples autoregression and cross-relation (CR) modeling. In ALTTS, the AR path is instantiated with a linear predictor, while the CR path uses a Transformer equipped with Cross-Relation Self-Attention (CRSA); the two branches are coordinated via alternating optimization to isolate gradient noise and reduce cross-block interference. Extensive experiments on multiple benchmarks show that ALTTS consistently outperforms prior methods, with the most pronounced improvements on long-horizon forecasting. Overall, our results suggest that carefully designed optimization strategies, rather than ever more complex architectures, can be a key driver of progress in multivariate time series forecasting.
