Out-of-Support Generalisation via Weight Space Sequence Modelling
Roussel Desmond Nzoyem
TL;DR
Out-of-support generalisation remains a critical challenge for neural models. This work reframes OoS generalisation as a weight-space sequence forecasting problem by partitioning the input domain into concentric rings around an anchor and learning a per-ring weight trajectory with a linear recurrence, encoded as $\theta_{t+1} = \phi \theta_t$. A stochastic extension models weights as Gaussian $\theta_t \sim \mathcal{N}(\mu_t, \mathrm{diag}(\sigma_t^2))$, uses the reparameterisation trick $\theta_t = \mu_t + \sigma_t \odot \epsilon$, and employs a first-order linearisation to obtain a predictive Gaussian with mean $\mu_y = f_{\mu_t}(x)$ and covariance $\Sigma_y = J\mathrm{diag}(\sigma_t^2)J^T + \sigma_{\text{noise}}^2 I$, together with KL regularisation to temper OoS confidence. Empirical results on a synthetic cosine task and real-world air quality data show WeightCaster achieving competitive or superior OoS performance at very low parameter counts, while offering interpretable weight dynamics and efficient computation. These findings suggest a promising direction for reliable, bias-agnostic OoS generalisation with practical impact in safety-critical domains, and the authors provide code to facilitate reproducibility.
Abstract
As breakthroughs in deep learning transform key industries, models are increasingly required to extrapolate on datapoints found outside the range of the training set, a challenge we coin as out-of-support (OoS) generalisation. However, neural networks frequently exhibit catastrophic failure on OoS samples, yielding unrealistic but overconfident predictions. We address this challenge by reformulating the OoS generalisation problem as a sequence modelling task in the weight space, wherein the training set is partitioned into concentric shells corresponding to discrete sequential steps. Our WeightCaster framework yields plausible, interpretable, and uncertainty-aware predictions without necessitating explicit inductive biases, all the while maintaining high computational efficiency. Emprical validation on a synthetic cosine dataset and real-world air quality sensor readings demonstrates performance competitive or superior to the state-of-the-art. By enhancing reliability beyond in-distribution scenarios, these results hold significant implications for the wider adoption of artificial intelligence in safety-critical applications.
