Sketched Lanczos uncertainty score: a low-memory summary of the Fisher information

Marco Miani; Lorenzo Beretta; Søren Hauberg

Sketched Lanczos uncertainty score: a low-memory summary of the Fisher information

Marco Miani, Lorenzo Beretta, Søren Hauberg

TL;DR

Sketched Lanczos Uncertainty (SLU) is developed: an architecture-agnostic uncertainty score that can be applied to pre-trained neural networks with minimal overhead and consistently outperforms existing methods in the low-memory regime.

Abstract

Current uncertainty quantification is memory and compute expensive, which hinders practical uptake. To counter, we develop Sketched Lanczos Uncertainty (SLU): an architecture-agnostic uncertainty score that can be applied to pre-trained neural networks with minimal overhead. Importantly, the memory use of SLU only grows logarithmically with the number of model parameters. We combine Lanczos' algorithm with dimensionality reduction techniques to compute a sketch of the leading eigenvectors of a matrix. Applying this novel algorithm to the Fisher information matrix yields a cheap and reliable uncertainty score. Empirically, SLU yields well-calibrated uncertainties, reliably detects out-of-distribution examples, and consistently outperforms existing methods in the low-memory regime.

Sketched Lanczos uncertainty score: a low-memory summary of the Fisher information

TL;DR

Abstract

Paper Structure (43 sections, 6 theorems, 17 equations, 9 figures, 5 tables, 2 algorithms)

This paper contains 43 sections, 6 theorems, 17 equations, 9 figures, 5 tables, 2 algorithms.

Introduction
Background
Uncertainty score
The Lanczos algorithm
The benefits of Lanczos.
The downsides of Lanczos.
Post-hoc orthogonalization Lanczos.
Sketching
Method
Sketched Lanczos
Sketched Lanczos Uncertainty score (slu)
Computing the uncertainty score through sketching.
Approximation quality.
Related work
Laplace's approximation.
...and 28 more sections

Key Result

Theorem 2.2

For any $p \times k$ matrix $U$, srft is a $(1\pm \varepsilon)$-subspace embedding for the column space of $U$ with probability $1-\delta$ as long as $s = \Omega((k + \log p) \varepsilon^{-2} \log(k / \delta))$.

Figures (9)

Figure 1: OoD detection performance ($\swarrow$) on a ResNet.
Figure 2: ggn eigenvalues exponential decay. Average and standard deviation over 5 seeds. Details are in \ref{['sec:spectral_property']}.
Figure 3: Sketch sizes $s$ comparison for: LeNet $p=40$K on FashionMnist vs Mnist (left), ResNet $p=300$K on Cifar-10 vs Cifar-corrupted with defocus blur (center), and VisualAttentionNet $p=4$M on CelebA vs Food101 (right). The lower the ratio $s/p$, the stronger the memory efficiency.
Figure 4: AUROC scores of Sketched Lanczos Uncertainty vs baselines with memory budget $3p$. slu outperforms the baselines on several choices of ID (\ref{['fig:ceoa']}, \ref{['fig:ceob']}, \ref{['fig:ceoc']}, \ref{['fig:ceod']}, \ref{['fig:ceoe']}) and OoD (x-axis) datasets pairs. Dashed lines are for improved visualization only; see \ref{['tab:results']} for values and standard deviations. Plots \ref{['fig:ceoa']}, \ref{['fig:ceob']}, \ref{['fig:ceoc']}, \ref{['fig:ceod']}, \ref{['fig:ceoe']} are averaged respectively over 10, 10, 5, 3, 1 independently trained models.
Figure 5: We study the ggn of a LeNet model with $44.000$ parameters trained on MNIST. We run $40$ iterations of hi-memory Lanczos and low-memory Lanczos. Let $H = [H_1| \dots |H_{40}]$, $\Lambda_H$, $L = [L_1 | \dots |L_{40}]$, and $\Lambda_L$ be the eigenvectors and eigenvalues computed by the two algorithms respectively. We sort both sets of eigenvectors in decreasing order of corresponding eigenvalues. In position $(i, j)$ we plot $\langle H_i, L_j \rangle$. It is apparent that multiple eigenvectors $L_j$ correspond to the same eigenvector $H_i$.
...and 4 more figures

Theorems & Definitions (10)

Definition 2.1: Subspace embedding
Theorem 2.2: Essentially, Theorem 7 in woodruff2014sketching
Lemma 3.0: Sketching low-rank matrices
Lemma 3.0: Orthogonalizing the sketch
Lemma A.0: Sketching low-rank matrices
proof
Lemma A.0: Orthogonalizing the sketch
proof
Lemma A.0: Orthogonalizing the sketch, for matrix queries
proof

Sketched Lanczos uncertainty score: a low-memory summary of the Fisher information

TL;DR

Abstract

Sketched Lanczos uncertainty score: a low-memory summary of the Fisher information

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (10)