Subtract the Corruption: Training-Data-Free Corrective Machine Unlearning using Task Arithmetic
Mostafa Mozafari, Farooq Ahmad Wani, Maria Sofia Bucarelli, Fabrizio Silvestri
TL;DR
This paper tackles post-training corruption by formalizing source-free corrective machine unlearning (CMU), where the original training data are unavailable. It introduces CUTS, a lightweight method that treats corruption as a distinct task and estimates a corruption vector in weight space using a small proxy dataset; the correction is performed by subtracting a calibrated multiple of this vector from the mixed-model weights. The approach is demonstrated across label-noise and backdoor-candidate corruptions, various architectures, and real-world data, showing substantial recovery of utility and robust backdoor removal with minimal data. The work highlights the role of pre-trained priors for disentangling clean and corruption directions and suggests practical avenues for zero-shot proxy generation and layer-wise edits in future work.
Abstract
Corrupted training data are ubiquitous. Corrective Machine Unlearning (CMU) seeks to remove the influence of such corruption post-training. Prior CMU typically assumes access to identified corrupted training samples (a "forget set"). However, in many real-world scenarios the training data are no longer accessible. We formalize source-free CMU, where the original training data are unavailable and, consequently, no forget set of identified corrupted training samples can be specified. Instead, we assume a small proxy (surrogate) set of corrupted samples that reflect the suspected corruption type without needing to be the original training samples. In this stricter setting, methods relying on forget set are ineffective or narrow in scope. We introduce Corrective Unlearning in Task Space (CUTS), a lightweight weight space correction method guided by the proxy set using task arithmetic principles. CUTS treats the clean and the corruption signal as distinct tasks. Specifically, we briefly fine-tune the corrupted model on the proxy to amplify the corruption mechanism in the weight space, compute the difference between the corrupted and fine-tuned weights as a proxy task vector, and subtract a calibrated multiple of this vector to cancel the corruption. Without access to clean data or a forget set, CUTS recovers a large fraction of the lost utility under label noise and, for backdoor triggers, nearly eliminates the attack with minimal damage to utility, outperforming state-of-the-art specialized CMU methods in source-free setting.
