Invariant Subspace Decomposition
Margherita Lazzaretto, Jonas Peters, Niklas Pfister
TL;DR
Invariant Subspace Decomposition (ISD) addresses non-stationary regression where $Y_t$ given $X_t$ evolves over time. It splits the parameter $\gamma_{0,t}$ into a time-invariant component $\beta^{\text{inv}}$ in an invariant subspace $\mathcal{S}^{\text{inv}}$ and a residual time-varying component $\delta^{\text{res}}_t$ in the complementary subspace $\mathcal{S}^{\text{res}}$, enabling zero-shot and time-adaptation prediction. The invariant part is learned from historical data via joint block diagonalization to identify $\mathcal{S}^{\text{inv}}$, while the residual part is estimated using adaptation data, achieving a finite-sample error bound that scales with $\dim(\mathcal{S}^{\text{inv}})/n + \dim(\mathcal{S}^{\text{res}})/m$. Theoretical results show ISD can outperform naive OLS and maximin approaches in non-stationary settings, with empirical validation on synthetic and real data demonstrating improved predictive accuracy in both zero-shot and time-adaptation tasks. The work lays groundwork for extending invariant-based time adaptation to nonlinear models and domain-specific applications.
Abstract
We consider the task of predicting a response Y from a set of covariates X in settings where the conditional distribution of Y given X changes over time. For this to be feasible, assumptions on how the conditional distribution changes over time are required. Existing approaches assume, for example, that changes occur smoothly over time so that short-term prediction using only the recent past becomes feasible. To additionally exploit observations further in the past, we propose a novel invariance-based framework for linear conditionals, called Invariant Subspace Decomposition (ISD), that splits the conditional distribution into a time-invariant and a residual time-dependent component. As we show, this decomposition can be utilized both for zero-shot and time-adaptation prediction tasks, that is, settings where either no or a small amount of training data is available at the time points we want to predict Y at, respectively. We propose a practical estimation procedure, which automatically infers the decomposition using tools from approximate joint matrix diagonalization. Furthermore, we provide finite sample guarantees for the proposed estimator and demonstrate empirically that it indeed improves on approaches that do not use the additional invariant structure.
