Table of Contents
Fetching ...

Exact Federated Continual Unlearning for Ridge Heads on Frozen Foundation Models

Yijun Quan, Wentai Wu, Giovanni Montana

Abstract

Foundation models are commonly deployed as frozen feature extractors with a small trainable head to adapt to private, user-generated data in federated settings. The ``right to be forgotten'' requires removing the influence of specific samples or users from the trained model on demand. Existing federated unlearning methods target general deep models and rely on approximate reconstruction or selective retraining, making exactness costly or elusive. We study this problem in a practically relevant but under-explored regime: a frozen foundation model with a ridge-regression head. The exact optimum depends on the data only through two additive sufficient statistics, which we turn into a communication protocol supporting an arbitrary stream of add and delete requests via fixed-size messages. The server maintains a head that is, in exact arithmetic, pointwise identical to centralized retraining after every request. We provide deterministic retrain-equivalence guarantees, order and partition invariance, two server-side variants, and a Bayesian certificate of zero KL divergence. Experiments on four benchmarks confirm the guarantees: both variants match centralized ridge retraining to within $10^{-9}$ relative Frobenius error and complete each request at orders-of-magnitude lower cost than federated retraining baselines.

Exact Federated Continual Unlearning for Ridge Heads on Frozen Foundation Models

Abstract

Foundation models are commonly deployed as frozen feature extractors with a small trainable head to adapt to private, user-generated data in federated settings. The ``right to be forgotten'' requires removing the influence of specific samples or users from the trained model on demand. Existing federated unlearning methods target general deep models and rely on approximate reconstruction or selective retraining, making exactness costly or elusive. We study this problem in a practically relevant but under-explored regime: a frozen foundation model with a ridge-regression head. The exact optimum depends on the data only through two additive sufficient statistics, which we turn into a communication protocol supporting an arbitrary stream of add and delete requests via fixed-size messages. The server maintains a head that is, in exact arithmetic, pointwise identical to centralized retraining after every request. We provide deterministic retrain-equivalence guarantees, order and partition invariance, two server-side variants, and a Bayesian certificate of zero KL divergence. Experiments on four benchmarks confirm the guarantees: both variants match centralized ridge retraining to within relative Frobenius error and complete each request at orders-of-magnitude lower cost than federated retraining baselines.
Paper Structure (38 sections, 8 theorems, 19 equations, 3 figures, 6 tables, 2 algorithms)

This paper contains 38 sections, 8 theorems, 19 equations, 3 figures, 6 tables, 2 algorithms.

Key Result

theorem 1

Let $(S_t,G_t)$ be the exact sufficient statistics of the global retained dataset $\mathcal{D}_t$. Then the server head $W_t=(S_t+\gamma\mathbf{I})^{-1}G_t$ equals the unique ridge-regression solution obtained by centralized retraining from scratch on $\mathcal{D}_t$.

Figures (3)

  • Figure 1: Protocol overview (Variant A). Clients compute frozen features locally and transmit only fixed-size sufficient-statistic messages $(S^\pm_{k,t}, G^\pm_{k,t})$. The server maintains a retained-statistics ledger ($S_t \leftarrow S_{t-1}{+}S_t^{+}{-}S_t^{-}$, $G_t \leftarrow G_{t-1}{+}G_t^{+}{-}G_t^{-}$) and recovers the exact ridge head by solving $(S_t{+}\gamma I)W_t = G_t$.
  • Figure 2: Test Accuracy and Cumulative Time for CIFAR-10 Chunked Deletions. Unlearning performance on CIFAR-10 with repeated deletions of $20\%$ of the total data per step. Deleted samples are randomly selected from all clients. Left: test accuracy tracking utility across deletion rounds. Right: cumulative unlearning time demonstrating system efficiency.
  • Figure 3: Wall time measurement of 200 unlearning requests (single data points) performed by different methods

Theorems & Definitions (16)

  • theorem 1: Deterministic retrain equivalence
  • proof
  • theorem 2: Order and partition invariance
  • proof
  • theorem 3: Equivalence of Variant A and Variant B in exact arithmetic
  • proof
  • lemma 1: Downdate feasibility
  • proof
  • lemma 2: Second-order information is necessary for deterministic exactness
  • proof
  • ...and 6 more