Table of Contents
Fetching ...

Orthogonal Soft Pruning for Efficient Class Unlearning

Qinghui Gong, Xue Yang, Xiaohu Tang

TL;DR

FedOrtho tackles data unlearning in federated learning under non-IID data by orthogonalizing convolutional kernels to decouple semantic representations and applying activation-guided one-shot soft pruning. The framework combines Federated Collaborative Orthogonal Training with activation difference statistics and adaptive pruning to erase forgotten data in a single forward pass, without retraining, while preserving retained knowledge via local-global alignment. The authors provide a dual-space theoretical justification showing that kernel orthogonality bounds cross-functional covariance, leading to feature decoupling, and validate the approach with extensive experiments on CIFAR-10/100 and TinyImageNet across ResNet and VGG architectures, achieving over 98% forgetting quality and subsecond erasure in centralized settings with minimal retention loss. FedOrtho demonstrates substantial efficiency gains in federated settings (2–3 orders of magnitude reductions in computation and communication) and improved privacy protection (lower MIA), making it a practical solution for verifiable data unlearning in collaborative environments.

Abstract

Efficient and controllable data unlearning in federated learning remains challenging, due to the trade-off between forgetting and retention performance. Especially under non-independent and identically distributed (non-IID) settings, where deep feature entanglement exacerbates this dilemma. To address this challenge, we propose FedOrtho, a federated unlearning framework that combines orthogonalized deep convolutional kernels with an activation-driven controllable one-shot soft pruning (OSP) mechanism. FedOrtho enforces kernel orthogonality and local-global alignment to decouple feature representations and mitigate client drift. This structural independence enables precise one-shot pruning of forgetting-related kernels while preserving retained knowledge. FedOrtho achieves SOTA performance on CIFAR-10, CIFAR100 and TinyImageNet with ResNet and VGG frameworks, verifying that FedOrtho supports class-, client-, and sample-level unlearning with over 98% forgetting quality. It reduces computational and communication costs by 2-3 orders of magnitude in federated settings and achieves subsecond-level erasure in centralized scenarios while maintaining over 97% retention accuracy and mitigating membership inference risks.

Orthogonal Soft Pruning for Efficient Class Unlearning

TL;DR

FedOrtho tackles data unlearning in federated learning under non-IID data by orthogonalizing convolutional kernels to decouple semantic representations and applying activation-guided one-shot soft pruning. The framework combines Federated Collaborative Orthogonal Training with activation difference statistics and adaptive pruning to erase forgotten data in a single forward pass, without retraining, while preserving retained knowledge via local-global alignment. The authors provide a dual-space theoretical justification showing that kernel orthogonality bounds cross-functional covariance, leading to feature decoupling, and validate the approach with extensive experiments on CIFAR-10/100 and TinyImageNet across ResNet and VGG architectures, achieving over 98% forgetting quality and subsecond erasure in centralized settings with minimal retention loss. FedOrtho demonstrates substantial efficiency gains in federated settings (2–3 orders of magnitude reductions in computation and communication) and improved privacy protection (lower MIA), making it a practical solution for verifiable data unlearning in collaborative environments.

Abstract

Efficient and controllable data unlearning in federated learning remains challenging, due to the trade-off between forgetting and retention performance. Especially under non-independent and identically distributed (non-IID) settings, where deep feature entanglement exacerbates this dilemma. To address this challenge, we propose FedOrtho, a federated unlearning framework that combines orthogonalized deep convolutional kernels with an activation-driven controllable one-shot soft pruning (OSP) mechanism. FedOrtho enforces kernel orthogonality and local-global alignment to decouple feature representations and mitigate client drift. This structural independence enables precise one-shot pruning of forgetting-related kernels while preserving retained knowledge. FedOrtho achieves SOTA performance on CIFAR-10, CIFAR100 and TinyImageNet with ResNet and VGG frameworks, verifying that FedOrtho supports class-, client-, and sample-level unlearning with over 98% forgetting quality. It reduces computational and communication costs by 2-3 orders of magnitude in federated settings and achieves subsecond-level erasure in centralized scenarios while maintaining over 97% retention accuracy and mitigating membership inference risks.

Paper Structure

This paper contains 32 sections, 5 theorems, 33 equations, 8 figures, 8 tables.

Key Result

Theorem 1

Let the conv kernel matrix $\boldsymbol{W} \in \mathbb{R}^{C_{\text{out}} \times D}$ (where $C_{\text{out}}$ is the number of output channels) satisfy $\|\boldsymbol{W}\boldsymbol{W}^\top - \boldsymbol{I}\|_F \leq \varepsilon$ ($\boldsymbol{I}$ is the identity matrix, and $\varepsilon$ is the orthog where $B,C>0$ are constants, $\eta$ is the patch whitening error, $\sigma$ is the sub-Gaussian para

Figures (8)

  • Figure 1: (a–b) Gram similarity heatmaps of layer4.2.conv2 show that orthogonal constraints significantly suppress the redundancy of conv kernels. (c) Feature coupling intensifies from shallow to deep layers.
  • Figure 2: Activation differences across 10 classes in ResNet-50 layer4.2.conv2 (Class 0 as the unlearning target). (a)/(b) for 3rd/8th kernel. left/right for non-orthogonal/orthogonal model.
  • Figure 3: Illustration of the FedOrtho workflow. During collaborative training, clients train the model by introducing local orthogonal constraints and weight alignment. After an unlearning request is triggered, clients calculate local activation differences. The server aggregates differences from all clients and performs pruning to achieve knowledge unlearning without retraining the entire model.
  • Figure 4: Grad-CAM heatmaps of CIFAR-10 before and after unlearning.
  • Figure 5: Communication and computation costs of federated unlearning process for different methods on ResNet-50/CIFAR-100.
  • ...and 3 more figures

Theorems & Definitions (12)

  • Definition 1: Conv Functional in Dual Space
  • Theorem 1: Feature Decoupling via Kernel Orthogonality
  • Definition 2: Convolutional Functional in Dual Space
  • Lemma 1: Near-Orthogonality of Kernel Vectors
  • proof
  • Theorem 2: Near-Orthogonality of Functional Covariance
  • proof
  • Definition 3: Class-Specific Functional Sets
  • Theorem 3: Data-Specific Role Separation
  • proof
  • ...and 2 more