SoK: Unintended Interactions among Machine Learning Defenses and Risks

Vasisht Duddu; Sebastian Szyller; N. Asokan

SoK: Unintended Interactions among Machine Learning Defenses and Risks

Vasisht Duddu, Sebastian Szyller, N. Asokan

TL;DR

The paper tackles the problem of unintended interactions between ML defenses and security, privacy, and fairness risks. It introduces a unified framework centered on overfitting and memorization as the core mediators, and maps how defenses and risks relate through controllable factors. It surveys existing literature, provides a guideline for conjecturing new interactions, and empirically validates two unexplored interactions, demonstrating practical implications for defense design and deployment. The work offers a structured pathway to anticipate cross-risk effects and minimize trade-offs in real-world ML systems.

Abstract

Machine learning (ML) models cannot neglect risks to security, privacy, and fairness. Several defenses have been proposed to mitigate such risks. When a defense is effective in mitigating one risk, it may correspond to increased or decreased susceptibility to other risks. Existing research lacks an effective framework to recognize and explain these unintended interactions. We present such a framework, based on the conjecture that overfitting and memorization underlie unintended interactions. We survey existing literature on unintended interactions, accommodating them within our framework. We use our framework to conjecture on two previously unexplored interactions, and empirically validate our conjectures.

SoK: Unintended Interactions among Machine Learning Defenses and Risks

TL;DR

Abstract

Paper Structure (26 sections, 4 figures, 9 tables)

This paper contains 26 sections, 4 figures, 9 tables.

Introduction
Background
ML Classifiers
Risks to ML
Defenses for ML
Framework
Overfitting
Memorization
Overfitting and Memorization
Overfitting and Memorization
Understanding Unintended Interactions
Revisiting Defenses for ML
Revisiting Risks in ML
Surveying Unintended Interactions
Guideline: Exploring Unintended Interactions
...and 11 more sections

Figures (4)

Figure 1: Relationship between overfitting and memorization: Average $\mathtt{mem}$ across all data records in $\mathcal{D}_{tr}\xspace$ and overfitting ($\mathcal{G}_{err}$) are at the bottom. Circles indicate $\mathcal{D}_{tr}\xspace$ data records, and crosses indicate $\mathcal{D}_{te}\xspace$ data records.
Figure 2: Relationship between overfitting and memorization: Average $\mathtt{mem}$ across all data records in $\mathcal{D}_{tr}\xspace$ and overfitting ($\mathcal{G}_{err}$) are at the bottom. Circles indicate $\mathcal{D}_{tr}\xspace$ data records, and crosses indicate $\mathcal{D}_{te}\xspace$ data records.
Figure 3: Distinguishability across subgroups (\ref{['obj2-subgroups']}) decreases for $f^{fair}_{\theta}\xspace$.
Figure 4: Nature of Interaction. Accuracy to differentiate explanations from model trained on $\mathcal{D}_{tr}\xspace$ with $\alpha_1$=$0.5$ and $\alpha_2$$\in$ {$0.1-0.9$}.

SoK: Unintended Interactions among Machine Learning Defenses and Risks

TL;DR

Abstract

SoK: Unintended Interactions among Machine Learning Defenses and Risks

Authors

TL;DR

Abstract

Table of Contents

Figures (4)