Low-Rank Filtering and Smoothing for Sequential Deep Learning
Joanna Sliwa, Frank Schneider, Nathanael Bosch, Agustinus Kristiadi, Philipp Hennig
TL;DR
The paper reframes sequential deep learning as Bayesian filtering and smoothing over a weight-state space, enabling principled incorporation of task relationships and backwards knowledge transfer. It introduces LR-LGF, a diagonal plus low-rank precision approach built on the generalized Gauss–Newton to enable efficient filtering and smoothing for deep networks. A smoothing extension allows task-specific models to benefit from data seen later, without accessing it, which is valuable for privacy-focused applications. Empirical results on CAMELYON and MNIST demonstrate improved forgetting behavior and competitive performance, with clear guidance on how to set the low-rank budget and how task relationships influence learning.
Abstract
Learning multiple tasks sequentially requires neural networks to balance retaining knowledge, yet being flexible enough to adapt to new tasks. Regularizing network parameters is a common approach, but it rarely incorporates prior knowledge about task relationships, and limits information flow to future tasks only. We propose a Bayesian framework that treats the network's parameters as the state space of a nonlinear Gaussian model, unlocking two key capabilities: (1) A principled way to encode domain knowledge about task relationships, allowing, e.g., control over which layers should adapt between tasks. (2) A novel application of Bayesian smoothing, allowing task-specific models to also incorporate knowledge from models learned later. This does not require direct access to their data, which is crucial, e.g., for privacy-critical applications. These capabilities rely on efficient filtering and smoothing operations, for which we propose diagonal plus low-rank approximations of the precision matrix in the Laplace approximation (LR-LGF). Empirical results demonstrate the efficiency of LR-LGF and the benefits of the unlocked capabilities.
