Learning Representations on the Unit Sphere: Investigating Angular Gaussian and von Mises-Fisher Distributions for Online Continual Learning

Nicolas Michel; Giovanni Chierchia; Romain Negrel; Jean-François Bercher

Learning Representations on the Unit Sphere: Investigating Angular Gaussian and von Mises-Fisher Distributions for Online Continual Learning

Nicolas Michel, Giovanni Chierchia, Romain Negrel, Jean-François Bercher

TL;DR

This work tackles online continual learning by learning unit-sphere representations via maximum a posteriori estimation. It compares two sphere-distributions, angular Gaussian and von Mises–Fisher, and uses fixed class directions to promote drift resilience without negative samples or explicit task boundaries. The method employs memory-based replay, multi-view batches, and Guillotine Regularization, and achieves strong results, especially under blurry task boundaries, across standard benchmarks. The approach is computationally efficient, scalable with memory, and opens avenues for applying hyperspherical losses to broader online learning settings.

Abstract

We use the maximum a posteriori estimation principle for learning representations distributed on the unit sphere. We propose to use the angular Gaussian distribution, which corresponds to a Gaussian projected on the unit-sphere and derive the associated loss function. We also consider the von Mises-Fisher distribution, which is the conditional of a Gaussian in the unit-sphere. The learned representations are pushed toward fixed directions, which are the prior means of the Gaussians; allowing for a learning strategy that is resilient to data drift. This makes it suitable for online continual learning, which is the problem of training neural networks on a continuous data stream, where multiple classification tasks are presented sequentially so that data from past tasks are no longer accessible, and data from the current task can be seen only once. To address this challenging scenario, we propose a memory-based representation learning technique equipped with our new loss functions. Our approach does not require negative data or knowledge of task boundaries and performs well with smaller batch sizes while being computationally efficient. We demonstrate with extensive experiments that the proposed method outperforms the current state-of-the-art methods on both standard evaluation scenarios and realistic scenarios with blurry task boundaries. For reproducibility, we use the same training pipeline for every compared method and share the code at https://github.com/Nicolas1203/ocl-fd.

Learning Representations on the Unit Sphere: Investigating Angular Gaussian and von Mises-Fisher Distributions for Online Continual Learning

TL;DR

Abstract

Paper Structure (49 sections, 3 theorems, 24 equations, 4 figures, 6 tables, 2 algorithms)

This paper contains 49 sections, 3 theorems, 24 equations, 4 figures, 6 tables, 2 algorithms.

Introduction
Related Work
Representation Learning
Class Incremental Learning (CIL)
Online Continual Learning
Replay-Based Methods
Fixed Directions
Learning on the Unit Sphere
Proposed Approach
Representation Learning With Maximum a Posteriori Estimation
Saw Distributions on the Unit Sphere
Fixed Directions for Continual Learning
Loss Expression
Implementation Details
Multi-View Batch
...and 34 more sections

Key Result

Proposition 1

With $\lambda = (\mu^T\Sigma^{-1} \mu)^{\frac{1}{2}}$ and $\alpha = \frac{u^T\Sigma^{-1}\mu}{u^T\Sigma^{-1} u}$, the probability density of the normalized Gaussian vector is with and can be computed as with $I_1 = \Phi(\alpha)$ and $I_2 = \phi(\alpha) + \alpha \Phi(\alpha)$, where $\phi(.)$ and $\Phi(.)$ are respectively the standard normal probability density function and cumulative distributi

Figures (4)

Figure 1: Training with fixed directions overview. Each class is assigned to a fixed vector of the standard basis. When changing task $\mathcal{T}$, new classes are encountered and mapped to remaining standard basis vectors. Best viewed in color.
Figure 2: Visualisation of class proportions in the incoming batch during training. The left side shows data drift with clear boundaries while the right side shows data drift with blurry boundaries for $\sigma=1500$ with 3 tasks, 10,000 images per task. $C_i$ corresponds to the classes of task $\mathcal{T}_i$ with $i \in [1,3]$.
Figure 3: Final average accuracy (%) for $\kappa^2 \in [0.02,20]$ on CIFAR-100 with $M=5k$.
Figure 4: Time consumption in minutes, for every trained methods, on CIFAR100, M=5k, and 10 tasks.

Theorems & Definitions (6)

Proposition 1
proof
Proposition 2
proof
Proposition 3
proof

Learning Representations on the Unit Sphere: Investigating Angular Gaussian and von Mises-Fisher Distributions for Online Continual Learning

TL;DR

Abstract

Learning Representations on the Unit Sphere: Investigating Angular Gaussian and von Mises-Fisher Distributions for Online Continual Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (6)