Table of Contents
Fetching ...

Revisiting Clustering of Neural Bandits: Selective Reinitialization for Mitigating Loss of Plasticity

Zhiyuan Su, Sunhao Dai, Xiao Zhang

TL;DR

This work targets the loss of plasticity in Clustering of Neural Bandits (CNB) by introducing Selective Reinitialization (SeRe), which identifies and refreshes only low-utility neural units to preserve adaptability in non-stationary environments. A change-detection mechanism dynamically tunes reinitialization frequency, enabling CNB to track evolving user preferences while retaining useful knowledge. The authors prove a sublinear regret bound in piecewise-stationary settings and validate the approach across six real-world recommendation datasets, showing meaningful regret reductions with minimal runtime overhead. The results demonstrate SeRe as a practical, scalable enhancement for neural bandit systems operating in dynamic, large-scale streaming contexts.

Abstract

Clustering of Bandits (CB) methods enhance sequential decision-making by grouping bandits into clusters based on similarity and incorporating cluster-level contextual information, demonstrating effectiveness and adaptability in applications like personalized streaming recommendations. However, when extending CB algorithms to their neural version (commonly referred to as Clustering of Neural Bandits, or CNB), they suffer from loss of plasticity, where neural network parameters become rigid and less adaptable over time, limiting their ability to adapt to non-stationary environments (e.g., dynamic user preferences in recommendation). To address this challenge, we propose Selective Reinitialization (SeRe), a novel bandit learning framework that dynamically preserves the adaptability of CNB algorithms in evolving environments. SeRe leverages a contribution utility metric to identify and selectively reset underutilized units, mitigating loss of plasticity while maintaining stable knowledge retention. Furthermore, when combining SeRe with CNB algorithms, the adaptive change detection mechanism adjusts the reinitialization frequency according to the degree of non-stationarity, ensuring effective adaptation without unnecessary resets. Theoretically, we prove that SeRe enables sublinear cumulative regret in piecewise-stationary environments, outperforming traditional CNB approaches in long-term performances. Extensive experiments on six real-world recommendation datasets demonstrate that SeRe-enhanced CNB algorithms can effectively mitigate the loss of plasticity with lower regrets, improving adaptability and robustness in dynamic settings.

Revisiting Clustering of Neural Bandits: Selective Reinitialization for Mitigating Loss of Plasticity

TL;DR

This work targets the loss of plasticity in Clustering of Neural Bandits (CNB) by introducing Selective Reinitialization (SeRe), which identifies and refreshes only low-utility neural units to preserve adaptability in non-stationary environments. A change-detection mechanism dynamically tunes reinitialization frequency, enabling CNB to track evolving user preferences while retaining useful knowledge. The authors prove a sublinear regret bound in piecewise-stationary settings and validate the approach across six real-world recommendation datasets, showing meaningful regret reductions with minimal runtime overhead. The results demonstrate SeRe as a practical, scalable enhancement for neural bandit systems operating in dynamic, large-scale streaming contexts.

Abstract

Clustering of Bandits (CB) methods enhance sequential decision-making by grouping bandits into clusters based on similarity and incorporating cluster-level contextual information, demonstrating effectiveness and adaptability in applications like personalized streaming recommendations. However, when extending CB algorithms to their neural version (commonly referred to as Clustering of Neural Bandits, or CNB), they suffer from loss of plasticity, where neural network parameters become rigid and less adaptable over time, limiting their ability to adapt to non-stationary environments (e.g., dynamic user preferences in recommendation). To address this challenge, we propose Selective Reinitialization (SeRe), a novel bandit learning framework that dynamically preserves the adaptability of CNB algorithms in evolving environments. SeRe leverages a contribution utility metric to identify and selectively reset underutilized units, mitigating loss of plasticity while maintaining stable knowledge retention. Furthermore, when combining SeRe with CNB algorithms, the adaptive change detection mechanism adjusts the reinitialization frequency according to the degree of non-stationarity, ensuring effective adaptation without unnecessary resets. Theoretically, we prove that SeRe enables sublinear cumulative regret in piecewise-stationary environments, outperforming traditional CNB approaches in long-term performances. Extensive experiments on six real-world recommendation datasets demonstrate that SeRe-enhanced CNB algorithms can effectively mitigate the loss of plasticity with lower regrets, improving adaptability and robustness in dynamic settings.

Paper Structure

This paper contains 23 sections, 4 theorems, 15 equations, 7 figures, 2 tables, 2 algorithms.

Key Result

theorem 1

For SeRe-enhanced CNB algorithms, in the piecewise-stationary setting of $S$ pieces, the cumulative dynamic regret over $T$ rounds satisfies In particular, if the number of pieces satisfies $S = o(T)$ (i.e., $S$ grows slower than $T$), then the overall regret is sublinear in $T$.

Figures (7)

  • Figure 1: Schematic illustration of clustering. For each item, users are grouped into clusters based on similarity in preferences or behaviors. The figure reflects how clusters adapt to specific items, illustrating item-varying user pieceation.
  • Figure 2: Loss of plasticity in existing CNB algorithms. (1) The left panel: the "-N" suffix indicates the neural version of the method, and "w/P" (i.e. "with Perturbations") means that periodic perturbations are added to the user features. Five experiments were performed for each setting: the middle line represents the average curve and the shaded area represents the 95% confidence interval. (2) The right panel: this box plot illustrates the $\ell_2$-norm of the difference in the last layer's parameters, computed from samples taken every 25 rounds over 10,000 rounds on the MovieLens dataset.
  • Figure 3: SeRe workflow at layer $l$: (1) Update contribution utility $u_{l,i}$ and age $\textit{age}_{l,i}$. (2) Increment counter $c_l$ based on matured units $s_m$. (3) If $c_l \geq 1$, then the unit with the lowest utility and the weights associated with it are reinitialized and the metric is updated.
  • Figure 4: Regret comparison between CNB algorithms and SeRe-enhanced CNB algorithms on three online recommendation datasets: the "-N" suffix indicates the neural version of the method, and " + SeRe" means this method is combined with SeRe. Five experiments were performed for each setting: the middle line represents the average curve and the shaded area represents the 95% confidence interval.
  • Figure 5: Sensitivity and Plasticity Analysis. (1) The left panel: comparison of regret curves of MCNB and MCNB + SeRe under different $\eta$. (2) The right panel: this box plot illustrates the $\ell_2$-norm of the difference in the last layer's parameters, computed from samples taken every 25 rounds over 10,000 rounds on the KuaiRec dataset.
  • ...and 2 more figures

Theorems & Definitions (6)

  • definition 1: $(\epsilon_1, \epsilon_2)$-User Cluster
  • theorem 1: Regret Upper Bound
  • Remark 1
  • lemma 1: Stationary Regret Bound ban2024meta
  • lemma 2: Selective Reinitialization Preserves Freshness
  • lemma 3: UCB Bound for Freshly Initialized Parameters