Table of Contents
Fetching ...

Benchmarking Federated Machine Unlearning methods for Tabular Data

Chenguang Xiao, Abhirup Ghosh, Han Wu, Shuo Wang, Diederick van Thiel

TL;DR

The paper addresses the need for privacy-preserving learning in federated environments with tabular data by benchmarking machine unlearning methods in two forgetting scenarios: feature-level and instance-level. It adapts neural models (logistic regression as a neural model and random forest) within a FedAvg FL framework and compares retraining, fine-tuning, and three gradient-based unlearning methods across six datasets, including a private finance dataset. Key findings show high fidelity across methods, with tree-based models offering strong certifiability (exact forgetting) and gradient-based methods offering better computational efficiency, particularly the gradient difference method for row unlearning. The work provides design guidance for selecting FMU approaches in FL and establishes a foundation for further privacy-preserving ML research in tabular data settings.

Abstract

Machine unlearning, which enables a model to forget specific data upon request, is increasingly relevant in the era of privacy-centric machine learning, particularly within federated learning (FL) environments. This paper presents a pioneering study on benchmarking machine unlearning methods within a federated setting for tabular data, addressing the unique challenges posed by cross-silo FL where data privacy and communication efficiency are paramount. We explore unlearning at the feature and instance levels, employing both machine learning, random forest and logistic regression models. Our methodology benchmarks various unlearning algorithms, including fine-tuning and gradient-based approaches, across multiple datasets, with metrics focused on fidelity, certifiability, and computational efficiency. Experiments demonstrate that while fidelity remains high across methods, tree-based models excel in certifiability, ensuring exact unlearning, whereas gradient-based methods show improved computational efficiency. This study provides critical insights into the design and selection of unlearning algorithms tailored to the FL environment, offering a foundation for further research in privacy-preserving machine learning.

Benchmarking Federated Machine Unlearning methods for Tabular Data

TL;DR

The paper addresses the need for privacy-preserving learning in federated environments with tabular data by benchmarking machine unlearning methods in two forgetting scenarios: feature-level and instance-level. It adapts neural models (logistic regression as a neural model and random forest) within a FedAvg FL framework and compares retraining, fine-tuning, and three gradient-based unlearning methods across six datasets, including a private finance dataset. Key findings show high fidelity across methods, with tree-based models offering strong certifiability (exact forgetting) and gradient-based methods offering better computational efficiency, particularly the gradient difference method for row unlearning. The work provides design guidance for selecting FMU approaches in FL and establishes a foundation for further privacy-preserving ML research in tabular data settings.

Abstract

Machine unlearning, which enables a model to forget specific data upon request, is increasingly relevant in the era of privacy-centric machine learning, particularly within federated learning (FL) environments. This paper presents a pioneering study on benchmarking machine unlearning methods within a federated setting for tabular data, addressing the unique challenges posed by cross-silo FL where data privacy and communication efficiency are paramount. We explore unlearning at the feature and instance levels, employing both machine learning, random forest and logistic regression models. Our methodology benchmarks various unlearning algorithms, including fine-tuning and gradient-based approaches, across multiple datasets, with metrics focused on fidelity, certifiability, and computational efficiency. Experiments demonstrate that while fidelity remains high across methods, tree-based models excel in certifiability, ensuring exact unlearning, whereas gradient-based methods show improved computational efficiency. This study provides critical insights into the design and selection of unlearning algorithms tailored to the FL environment, offering a foundation for further research in privacy-preserving machine learning.

Paper Structure

This paper contains 15 sections, 1 figure, 4 tables.

Figures (1)

  • Figure 1: F1 curve of the unlearning algorithms.