Table of Contents
Fetching ...

Tackling Federated Unlearning as a Parameter Estimation Problem

Antonio Balordi, Lorenzo Manini, Fabio Stella, Alessio Merlo

TL;DR

This work tackles the data-forgetting problem in Federated Learning by recasting information leakage as a parameter-estimation task. It introduces a principled FU framework that uses Hessian-based second-order information to compute a per-parameter Target Information Score (TIS) and selectively reset the most informative parameters, followed by minimal retraining. Empirical results on MNIST, Fashion-MNIST, and CIFAR-10 across IID and non-IID settings show that the Reset method achieves privacy levels close to retraining while preserving high performance (Normalized Test Accuracy around 0.7–0.94) and significantly improves efficiency (RTR around 0.29–0.3) compared to full retraining; it also effectively neutralizes backdoors. The approach is data- and architecture-agnostic, does not require raw data access at the server, and offers a practical path to implementing Right-To-Be-Forgotten in federated systems, with future work exploring richer second-order information and real-world scalability.

Abstract

Privacy regulations require the erasure of data from deep learning models. This is a significant challenge that is amplified in Federated Learning, where data remains on clients, making full retraining or coordinated updates often infeasible. This work introduces an efficient Federated Unlearning framework based on information theory, modeling leakage as a parameter estimation problem. Our method uses second-order Hessian information to identify and selectively reset only the parameters most sensitive to the data being forgotten, followed by minimal federated retraining. This model-agnostic approach supports categorical and client unlearning without requiring server access to raw client data after initial information aggregation. Evaluations on benchmark datasets demonstrate strong privacy (MIA success near random, categorical knowledge erased) and high performance (Normalized Accuracy against re-trained benchmarks of $\approx$ 0.9), while aiming for increased efficiency over complete retraining. Furthermore, in a targeted backdoor attack scenario, our framework effectively neutralizes the malicious trigger, restoring model integrity. This offers a practical solution for data forgetting in FL.

Tackling Federated Unlearning as a Parameter Estimation Problem

TL;DR

This work tackles the data-forgetting problem in Federated Learning by recasting information leakage as a parameter-estimation task. It introduces a principled FU framework that uses Hessian-based second-order information to compute a per-parameter Target Information Score (TIS) and selectively reset the most informative parameters, followed by minimal retraining. Empirical results on MNIST, Fashion-MNIST, and CIFAR-10 across IID and non-IID settings show that the Reset method achieves privacy levels close to retraining while preserving high performance (Normalized Test Accuracy around 0.7–0.94) and significantly improves efficiency (RTR around 0.29–0.3) compared to full retraining; it also effectively neutralizes backdoors. The approach is data- and architecture-agnostic, does not require raw data access at the server, and offers a practical path to implementing Right-To-Be-Forgotten in federated systems, with future work exploring richer second-order information and real-world scalability.

Abstract

Privacy regulations require the erasure of data from deep learning models. This is a significant challenge that is amplified in Federated Learning, where data remains on clients, making full retraining or coordinated updates often infeasible. This work introduces an efficient Federated Unlearning framework based on information theory, modeling leakage as a parameter estimation problem. Our method uses second-order Hessian information to identify and selectively reset only the parameters most sensitive to the data being forgotten, followed by minimal federated retraining. This model-agnostic approach supports categorical and client unlearning without requiring server access to raw client data after initial information aggregation. Evaluations on benchmark datasets demonstrate strong privacy (MIA success near random, categorical knowledge erased) and high performance (Normalized Accuracy against re-trained benchmarks of 0.9), while aiming for increased efficiency over complete retraining. Furthermore, in a targeted backdoor attack scenario, our framework effectively neutralizes the malicious trigger, restoring model integrity. This offers a practical solution for data forgetting in FL.

Paper Structure

This paper contains 30 sections, 34 equations, 3 figures, 5 tables, 2 algorithms.

Figures (3)

  • Figure 1: The privacy-performance trade-off across multiple datasets and data distribution settings. Our proposed method (Reset, blue curve) is compared against the Random Reset baseline (orange curve). The y-axis shows the NTA metric, and the x-axis shows the $NFS_{\text{MIA}}$ score. The ideal point, representing retraining from scratch, is the red 'x' at (1, 1).
  • Figure 2: Privacy-performance trade-off on CIFAR-10 with preferential setting after an epoch of retrain. Our method Retrained (blue curve) is compared with the Random Retrained baseline (orange curve).
  • Figure :