Out-of-Support Generalisation via Weight Space Sequence Modelling

Roussel Desmond Nzoyem

Out-of-Support Generalisation via Weight Space Sequence Modelling

Roussel Desmond Nzoyem

TL;DR

Out-of-support generalisation remains a critical challenge for neural models. This work reframes OoS generalisation as a weight-space sequence forecasting problem by partitioning the input domain into concentric rings around an anchor and learning a per-ring weight trajectory with a linear recurrence, encoded as $\theta_{t+1} = \phi \theta_t$. A stochastic extension models weights as Gaussian $\theta_t \sim \mathcal{N}(\mu_t, \mathrm{diag}(\sigma_t^2))$, uses the reparameterisation trick $\theta_t = \mu_t + \sigma_t \odot \epsilon$, and employs a first-order linearisation to obtain a predictive Gaussian with mean $\mu_y = f_{\mu_t}(x)$ and covariance $\Sigma_y = J\mathrm{diag}(\sigma_t^2)J^T + \sigma_{\text{noise}}^2 I$, together with KL regularisation to temper OoS confidence. Empirical results on a synthetic cosine task and real-world air quality data show WeightCaster achieving competitive or superior OoS performance at very low parameter counts, while offering interpretable weight dynamics and efficient computation. These findings suggest a promising direction for reliable, bias-agnostic OoS generalisation with practical impact in safety-critical domains, and the authors provide code to facilitate reproducibility.

Abstract

As breakthroughs in deep learning transform key industries, models are increasingly required to extrapolate on datapoints found outside the range of the training set, a challenge we coin as out-of-support (OoS) generalisation. However, neural networks frequently exhibit catastrophic failure on OoS samples, yielding unrealistic but overconfident predictions. We address this challenge by reformulating the OoS generalisation problem as a sequence modelling task in the weight space, wherein the training set is partitioned into concentric shells corresponding to discrete sequential steps. Our WeightCaster framework yields plausible, interpretable, and uncertainty-aware predictions without necessitating explicit inductive biases, all the while maintaining high computational efficiency. Emprical validation on a synthetic cosine dataset and real-world air quality sensor readings demonstrates performance competitive or superior to the state-of-the-art. By enhancing reliability beyond in-distribution scenarios, these results hold significant implications for the wider adoption of artificial intelligence in safety-critical applications.

Out-of-Support Generalisation via Weight Space Sequence Modelling

TL;DR

. A stochastic extension models weights as Gaussian

, uses the reparameterisation trick

, and employs a first-order linearisation to obtain a predictive Gaussian with mean

and covariance

, together with KL regularisation to temper OoS confidence. Empirical results on a synthetic cosine task and real-world air quality data show WeightCaster achieving competitive or superior OoS performance at very low parameter counts, while offering interpretable weight dynamics and efficient computation. These findings suggest a promising direction for reliable, bias-agnostic OoS generalisation with practical impact in safety-critical domains, and the authors provide code to facilitate reproducibility.

Abstract

Paper Structure (16 sections, 6 equations, 2 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 6 equations, 2 figures, 1 table, 1 algorithm.

Introduction
Problem Setting
Related Work
Method
Domain Decomposition
Weight Space Sequence Modelling
Stochastic Framework for Regression
Reparameterisation trick.
Marginalisation via linearisation.
Loss function regularisation.
Main Results
Experimental Setup
Cosine Dataset.
AirQuality Dataset.
Discussion
...and 1 more sections

Figures (2)

Figure 1: Illustration of the two main steps of the WeightCaster framework for sinusoidal extrapolation. (a) First, an anchor point is chosen and the input domain $\mathbb{R}^1$ is decomposed into $T=10$ "rings", here clearly delineated as intervals. Within each ring, we consider a simple linear model $\hat{y} = \theta^1 \cdot x + \theta^2$, where $\theta = [ \theta^1, \theta^2 ]^T$ contains the slope and intercept to anchor, respectively. (b) Optimal weights $\{\theta_t\}_{t=1}^{T_{\text{tr}}}$ for the data in each ring are subsequently computed by fitting a weight space sequence model, one ring corresponding to one time step. Suitable weights for OoS datapoints are obtained by rolling out the sequence model for time steps $t \geq T_{\text{tr}}$ (in this example, $T_{\text{tr}}=9$).
Figure 2: Performance of WeightCaster compared to baselines. (Top Row) Extrapolation on the Cosine wave experiment. (Bottom Row) OoS generalisation on the AirQuality sensor dataset. Shaded areas represent the pointwise $2\sigma$ uncertainty estimates.

Out-of-Support Generalisation via Weight Space Sequence Modelling

TL;DR

Abstract

Out-of-Support Generalisation via Weight Space Sequence Modelling

Authors

TL;DR

Abstract

Table of Contents

Figures (2)