Table of Contents
Fetching ...

FIRE: Frobenius-Isometry Reinitialization for Balancing the Stability-Plasticity Tradeoff

Isaac Han, Sangyeon Park, Seungwon Oh, Donghu Kim, Hojoon Lee, Kyung-Joong Kim

TL;DR

FIRE addresses the longstanding stability–plasticity tradeoff in continual learning by formulating reinitialization as a constrained optimization that minimizes the stability gap to past weights while enforcing isotropy to preserve plasticity. It introduces two differentiable metrics, Squared Frobenius Error $SFE$ and Deviation from Isometry $DfI$, and derives a principled projection onto an isotropic manifold using an orthogonal Procrustes view, approximated efficiently with a Newton–Schulz iteration. The method is validated across continual visual learning, continual pretraining of LLMs, and reinforcement learning, consistently beating naive training and standard reinitialization baselines with modest overhead. FIRE demonstrates that explicit control of the stability–plasticity tradeoff yields robust, transfer-friendly representations in nonstationary environments, with practical applicability across vision, language, and control domains.

Abstract

Deep neural networks trained on nonstationary data must balance stability (i.e., retaining prior knowledge) and plasticity (i.e., adapting to new tasks). Standard reinitialization methods, which reinitialize weights toward their original values, are widely used but difficult to tune: conservative reinitializations fail to restore plasticity, while aggressive ones erase useful knowledge. We propose FIRE, a principled reinitialization method that explicitly balances the stability-plasticity tradeoff. FIRE quantifies stability through Squared Frobenius Error (SFE), measuring proximity to past weights, and plasticity through Deviation from Isometry (DfI), reflecting weight isotropy. The reinitialization point is obtained by solving a constrained optimization problem, minimizing SFE subject to DfI being zero, which is efficiently approximated by Newton-Schulz iteration. FIRE is evaluated on continual visual learning (CIFAR-10 with ResNet-18), language modeling (OpenWebText with GPT-0.1B), and reinforcement learning (HumanoidBench with SAC and Atari games with DQN). Across all domains, FIRE consistently outperforms both naive training without intervention and standard reinitialization methods, demonstrating effective balancing of the stability-plasticity tradeoff.

FIRE: Frobenius-Isometry Reinitialization for Balancing the Stability-Plasticity Tradeoff

TL;DR

FIRE addresses the longstanding stability–plasticity tradeoff in continual learning by formulating reinitialization as a constrained optimization that minimizes the stability gap to past weights while enforcing isotropy to preserve plasticity. It introduces two differentiable metrics, Squared Frobenius Error and Deviation from Isometry , and derives a principled projection onto an isotropic manifold using an orthogonal Procrustes view, approximated efficiently with a Newton–Schulz iteration. The method is validated across continual visual learning, continual pretraining of LLMs, and reinforcement learning, consistently beating naive training and standard reinitialization baselines with modest overhead. FIRE demonstrates that explicit control of the stability–plasticity tradeoff yields robust, transfer-friendly representations in nonstationary environments, with practical applicability across vision, language, and control domains.

Abstract

Deep neural networks trained on nonstationary data must balance stability (i.e., retaining prior knowledge) and plasticity (i.e., adapting to new tasks). Standard reinitialization methods, which reinitialize weights toward their original values, are widely used but difficult to tune: conservative reinitializations fail to restore plasticity, while aggressive ones erase useful knowledge. We propose FIRE, a principled reinitialization method that explicitly balances the stability-plasticity tradeoff. FIRE quantifies stability through Squared Frobenius Error (SFE), measuring proximity to past weights, and plasticity through Deviation from Isometry (DfI), reflecting weight isotropy. The reinitialization point is obtained by solving a constrained optimization problem, minimizing SFE subject to DfI being zero, which is efficiently approximated by Newton-Schulz iteration. FIRE is evaluated on continual visual learning (CIFAR-10 with ResNet-18), language modeling (OpenWebText with GPT-0.1B), and reinforcement learning (HumanoidBench with SAC and Atari games with DQN). Across all domains, FIRE consistently outperforms both naive training without intervention and standard reinitialization methods, demonstrating effective balancing of the stability-plasticity tradeoff.
Paper Structure (37 sections, 12 theorems, 74 equations, 8 figures, 10 tables, 1 algorithm)

This paper contains 37 sections, 12 theorems, 74 equations, 8 figures, 10 tables, 1 algorithm.

Key Result

Theorem 1

Let $\Theta=\{W^1,\dots,W^L\}$ and $\widetilde{\Theta}=\{\widetilde{W}^1,\dots,\widetilde{W}^L\}$ be the parameters of two depth-$L$ feedforward networks with elementwise activations $\sigma_\ell$ (Lipschitz constants $L_{\sigma_\ell}$). For an input batch $Z\in\mathbb{R}^{n\times d_0}$, we denote t In particular, if each activation is $1$-Lipschitz $(L_{\sigma_k}\le 1)$ and each spectral norm is

Figures (8)

  • Figure 1: Illustration of FIRE. Solving a constrained optimization problem, FIRE places weights at the intersection of high-stability and high-plasticity manifolds.
  • Figure 2: Continual visual learning results. Warm-start setting (a): training begins with only 10% of the data before continuing on the full dataset. Continual setting (b): the dataset is revealed in ten stages, expanding from 10% to 100% in 10% increments. Class-incremental setting (c): new classes are introduced over 20 phases, with an equal number of classes added at each phase.
  • Figure 3: Continual pretraining of GPT-0.1B. Models are first pretrained on WikiText-103 and then continually trained on a new dataset consisting of a mixture of OpenWebText and WikiText-103. From left to right, results correspond to models initialized from the best checkpoint during pretraining, from 30k pretraining iterations, and from 60k pretraining iterations.
  • Figure 4: Reinforcement learning results. Discrete control with DQN on three Atari environments that suffer from severe plasticity loss (a) and continuous control with SAC on three HumanoidBench tasks (b). The black dashed line indicates the point at which reinitialization is applied.
  • Figure 5: Ablation study results. Final performance of FIRE with varying numbers of iterations for Netwon-Schulz algorithm (a). Comparison of FIRE and baselines in terms of loss curvature (maximum eigenvalue of the Hessian), plasticity (DfI), and stability (normalized SFE) (b).
  • ...and 3 more figures

Theorems & Definitions (21)

  • Theorem 1: SFE bounds output feature covariance between two deep neural networks
  • Theorem 2: Hessian spectral norm bounded by layerwise DfIs
  • Theorem 3: DfI controls effective rank
  • Theorem 4: Minimizing DfI increases neuron activity score
  • proof
  • Lemma 1: DfI controls the spectral norm
  • proof
  • Lemma 2: Covariance/spectral growth through layers
  • proof
  • Lemma 3: Block-Jacobian bound
  • ...and 11 more