Table of Contents
Fetching ...

HINT: Hypernetwork Approach to Training Weight Interval Regions in Continual Learning

Patryk Krukowski, Anna Bielawska, Kamil Książek, Paweł Wawrzyński, Paweł Batorski, Przemysław Spurek

TL;DR

This paper tackles catastrophic forgetting in continual learning by introducing HINT, which confines learning to low-dimensional interval embeddings and uses a hypernetwork to map these intervals into interval weights for a target network. The core idea is to propagate interval embeddings through an IBP-based hypernetwork so that interval weights for each task are produced as $[\theta_t, \bar{\theta}_t]$, and the intersection across tasks yields a universal embedding, enabling a single weight set for all tasks. The method provides theoretical non-forgetting guarantees under a non-empty intersection and a regularization term that preserves prior task mappings. Empirically, HINT outperforms the InterContiNet baseline on several benchmarks and achieves competitive or state-of-the-art results across TIL, DIL, and CIL settings, with reduced memory since a universal embedding suffices for inference.

Abstract

Recently, a new Continual Learning (CL) paradigm was presented to control catastrophic forgetting, called Interval Continual Learning (InterContiNet), which relies on enforcing interval constraints on the neural network parameter space. Unfortunately, InterContiNet training is challenging due to the high dimensionality of the weight space, making intervals difficult to manage. To address this issue, we introduce HINT, a technique that employs interval arithmetic within the embedding space and utilizes a hypernetwork to map these intervals to the target network parameter space. We train interval embeddings for consecutive tasks and train a hypernetwork to transform these embeddings into weights of the target network. An embedding for a given task is trained along with the hypernetwork, preserving the response of the target network for the previous task embeddings. Interval arithmetic works with a more manageable, lower-dimensional embedding space rather than directly preparing intervals in a high-dimensional weight space. Our model allows faster and more efficient training. Furthermore, HINT maintains the guarantee of not forgetting. At the end of training, we can choose one universal embedding to produce a single network dedicated to all tasks. In such a framework, hypernetwork is used only for training and, finally, we can utilize one set of weights. HINT obtains significantly better results than InterContiNet and gives SOTA results on several benchmarks.

HINT: Hypernetwork Approach to Training Weight Interval Regions in Continual Learning

TL;DR

This paper tackles catastrophic forgetting in continual learning by introducing HINT, which confines learning to low-dimensional interval embeddings and uses a hypernetwork to map these intervals into interval weights for a target network. The core idea is to propagate interval embeddings through an IBP-based hypernetwork so that interval weights for each task are produced as , and the intersection across tasks yields a universal embedding, enabling a single weight set for all tasks. The method provides theoretical non-forgetting guarantees under a non-empty intersection and a regularization term that preserves prior task mappings. Empirically, HINT outperforms the InterContiNet baseline on several benchmarks and achieves competitive or state-of-the-art results across TIL, DIL, and CIL settings, with reduced memory since a universal embedding suffices for inference.

Abstract

Recently, a new Continual Learning (CL) paradigm was presented to control catastrophic forgetting, called Interval Continual Learning (InterContiNet), which relies on enforcing interval constraints on the neural network parameter space. Unfortunately, InterContiNet training is challenging due to the high dimensionality of the weight space, making intervals difficult to manage. To address this issue, we introduce HINT, a technique that employs interval arithmetic within the embedding space and utilizes a hypernetwork to map these intervals to the target network parameter space. We train interval embeddings for consecutive tasks and train a hypernetwork to transform these embeddings into weights of the target network. An embedding for a given task is trained along with the hypernetwork, preserving the response of the target network for the previous task embeddings. Interval arithmetic works with a more manageable, lower-dimensional embedding space rather than directly preparing intervals in a high-dimensional weight space. Our model allows faster and more efficient training. Furthermore, HINT maintains the guarantee of not forgetting. At the end of training, we can choose one universal embedding to produce a single network dedicated to all tasks. In such a framework, hypernetwork is used only for training and, finally, we can utilize one set of weights. HINT obtains significantly better results than InterContiNet and gives SOTA results on several benchmarks.
Paper Structure (44 sections, 5 theorems, 48 equations, 13 figures, 5 tables, 2 algorithms)

This paper contains 44 sections, 5 theorems, 48 equations, 13 figures, 5 tables, 2 algorithms.

Key Result

Lemma 3.1

Let $(e_1, e_2, \ldots, e_T)$ be embedding centers, $T$ be the number of CL tasks, $\gamma > 0$ be a perturbation value, $\mathcal{H}(\cdot; \eta)$ be a hypernetwork with weights $\eta$, and $M$ be a natural number representing the dimensionality of the embedding space, $\stackunder[1.2pt]{$e$}{}_t$ has a non-empty intersection.

Figures (13)

  • Figure 1: HINT uses interval arithmetic in the input to the hypernetwork. After propagating the intervals through the hypernetwork, we obtain the intervals on the target network layers. The intersection of all intervals produces universal embeddings dedicated to all tasks. Our model gives theoretical guarantees for not forgetting.
  • Figure 2: Embedding intervals for Split CIFAR-100, 5 tasks with 20 classes each, using the $\cos\left(\cdot\right)$ nesting method. The ten first embedding coordinates are shown.
  • Figure 3: Mean test accuracy for consecutive CL tasks averaged over 2 runs of different interval size settings of HINT for 10 tasks of Permuted MNIST-10 dataset.
  • Figure 4: Histograms are calculated for the Split MNIST and Split CIFAR-100 datasets using the MLP and ResNet-18 architectures, respectively.
  • Figure 5: Ten first intervals around task embeddings for Permuted MNIST-10 and Split MNIST.
  • ...and 8 more figures

Theorems & Definitions (10)

  • Lemma 3.1
  • Theorem 3.2
  • Proposition 1
  • proof : Proof of Proposition \ref{['prop:lipschitz_property']}
  • proof : Proof of Lemma \ref{['lemma:intersection']}
  • proof : Proof of Theorem \ref{['theorem:non_forgetting']}
  • Proposition 2
  • proof
  • Proposition 3
  • proof