Revisiting Clustering of Neural Bandits: Selective Reinitialization for Mitigating Loss of Plasticity

Zhiyuan Su; Sunhao Dai; Xiao Zhang

Revisiting Clustering of Neural Bandits: Selective Reinitialization for Mitigating Loss of Plasticity

Zhiyuan Su, Sunhao Dai, Xiao Zhang

TL;DR

This work targets the loss of plasticity in Clustering of Neural Bandits (CNB) by introducing Selective Reinitialization (SeRe), which identifies and refreshes only low-utility neural units to preserve adaptability in non-stationary environments. A change-detection mechanism dynamically tunes reinitialization frequency, enabling CNB to track evolving user preferences while retaining useful knowledge. The authors prove a sublinear regret bound in piecewise-stationary settings and validate the approach across six real-world recommendation datasets, showing meaningful regret reductions with minimal runtime overhead. The results demonstrate SeRe as a practical, scalable enhancement for neural bandit systems operating in dynamic, large-scale streaming contexts.

Abstract

Clustering of Bandits (CB) methods enhance sequential decision-making by grouping bandits into clusters based on similarity and incorporating cluster-level contextual information, demonstrating effectiveness and adaptability in applications like personalized streaming recommendations. However, when extending CB algorithms to their neural version (commonly referred to as Clustering of Neural Bandits, or CNB), they suffer from loss of plasticity, where neural network parameters become rigid and less adaptable over time, limiting their ability to adapt to non-stationary environments (e.g., dynamic user preferences in recommendation). To address this challenge, we propose Selective Reinitialization (SeRe), a novel bandit learning framework that dynamically preserves the adaptability of CNB algorithms in evolving environments. SeRe leverages a contribution utility metric to identify and selectively reset underutilized units, mitigating loss of plasticity while maintaining stable knowledge retention. Furthermore, when combining SeRe with CNB algorithms, the adaptive change detection mechanism adjusts the reinitialization frequency according to the degree of non-stationarity, ensuring effective adaptation without unnecessary resets. Theoretically, we prove that SeRe enables sublinear cumulative regret in piecewise-stationary environments, outperforming traditional CNB approaches in long-term performances. Extensive experiments on six real-world recommendation datasets demonstrate that SeRe-enhanced CNB algorithms can effectively mitigate the loss of plasticity with lower regrets, improving adaptability and robustness in dynamic settings.

Revisiting Clustering of Neural Bandits: Selective Reinitialization for Mitigating Loss of Plasticity

TL;DR

Abstract

Revisiting Clustering of Neural Bandits: Selective Reinitialization for Mitigating Loss of Plasticity

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (6)