SHIELD: Secure Hypernetworks for Incremental Expansion Learning Defense
Patryk Krukowski, Łukasz Gorczyca, Piotr Helm, Kamil Książek, Przemysław Spurek
TL;DR
We address the challenge of robust continual learning by proposing SHIELD, a framework that unifies certifiable adversarial robustness with sequential task adaptation. SHIELD uses a hypernetwork to generate task-specific target models from compact embeddings and employs Interval Bound Propagation to provide formal robustness guarantees on interval inputs, complemented by Interval MixUp to tighten bounds and smooth decision boundaries. The core contributions are the SHIELD architecture, the Interval MixUp training strategy, and theoretical robustness guarantees with empirical validation across diverse benchmarks, including CIL scenarios and TinyImageNet. SHIELD demonstrates state-of-the-art or competitive robust performance while maintaining scalability and privacy by avoiding replay buffers and full model copies, marking a significant step toward practical robust lifelong learning under adversarial threats.
Abstract
Continual learning under adversarial conditions remains an open problem, as existing methods often compromise either robustness, scalability, or both. We propose a novel framework that integrates Interval Bound Propagation (IBP) with a hypernetwork-based architecture to enable certifiably robust continual learning across sequential tasks. Our method, SHIELD, generates task-specific model parameters via a shared hypernetwork conditioned solely on compact task embeddings, eliminating the need for replay buffers or full model copies and enabling efficient over time. To further enhance robustness, we introduce Interval MixUp, a novel training strategy that blends virtual examples represented as $\ell_{\infty}$ balls centered around MixUp points. Leveraging interval arithmetic, this technique guarantees certified robustness while mitigating the wrapping effect, resulting in smoother decision boundaries. We evaluate SHIELD under strong white-box adversarial attacks, including PGD and AutoAttack, across multiple benchmarks. It consistently outperforms existing robust continual learning methods, achieving state-of-the-art average accuracy while maintaining both scalability and certification. These results represent a significant step toward practical and theoretically grounded continual learning in adversarial settings.
