Table of Contents
Fetching ...

Be Careful What You Smooth For: Label Smoothing Can Be a Privacy Shield but Also a Catalyst for Model Inversion Attacks

Lukas Struppek, Dominik Hintersdorf, Kristian Kersting

TL;DR

The paper investigates how label smoothing (LS) impacts privacy in deep classifiers under model inversion attacks (MIAs). By formalizing LS as $\mathbf{y}^{\text{LS}}=(1-\alpha)\mathbf{y}+\frac{\alpha}{C}\mathbf{1}$ with $\alpha\in(-\infty,1]$, it demonstrates that positive LS increases privacy leakage, especially with limited data, while negative LS mitigates MIAs and can outperform existing defenses in the utility-privacy trade-off. Through high-resolution face-recognition experiments using Plug & Play Attacks (PPA), embedding-space analyses, and ablations, the study shows that negative LS not only reduces leakage but also yields more robust defenses without major utility loss. This reveals a practical, parameterizable defense strategy against MIAs and highlights the broader need to account for regularization choices when evaluating model privacy in real-world deployments.

Abstract

Label smoothing -- using softened labels instead of hard ones -- is a widely adopted regularization method for deep learning, showing diverse benefits such as enhanced generalization and calibration. Its implications for preserving model privacy, however, have remained unexplored. To fill this gap, we investigate the impact of label smoothing on model inversion attacks (MIAs), which aim to generate class-representative samples by exploiting the knowledge encoded in a classifier, thereby inferring sensitive information about its training data. Through extensive analyses, we uncover that traditional label smoothing fosters MIAs, thereby increasing a model's privacy leakage. Even more, we reveal that smoothing with negative factors counters this trend, impeding the extraction of class-related information and leading to privacy preservation, beating state-of-the-art defenses. This establishes a practical and powerful novel way for enhancing model resilience against MIAs.

Be Careful What You Smooth For: Label Smoothing Can Be a Privacy Shield but Also a Catalyst for Model Inversion Attacks

TL;DR

The paper investigates how label smoothing (LS) impacts privacy in deep classifiers under model inversion attacks (MIAs). By formalizing LS as with , it demonstrates that positive LS increases privacy leakage, especially with limited data, while negative LS mitigates MIAs and can outperform existing defenses in the utility-privacy trade-off. Through high-resolution face-recognition experiments using Plug & Play Attacks (PPA), embedding-space analyses, and ablations, the study shows that negative LS not only reduces leakage but also yields more robust defenses without major utility loss. This reveals a practical, parameterizable defense strategy against MIAs and highlights the broader need to account for regularization choices when evaluating model privacy in real-world deployments.

Abstract

Label smoothing -- using softened labels instead of hard ones -- is a widely adopted regularization method for deep learning, showing diverse benefits such as enhanced generalization and calibration. Its implications for preserving model privacy, however, have remained unexplored. To fill this gap, we investigate the impact of label smoothing on model inversion attacks (MIAs), which aim to generate class-representative samples by exploiting the knowledge encoded in a classifier, thereby inferring sensitive information about its training data. Through extensive analyses, we uncover that traditional label smoothing fosters MIAs, thereby increasing a model's privacy leakage. Even more, we reveal that smoothing with negative factors counters this trend, impeding the extraction of class-related information and leading to privacy preservation, beating state-of-the-art defenses. This establishes a practical and powerful novel way for enhancing model resilience against MIAs.
Paper Structure (31 sections, 16 equations, 12 figures, 13 tables)

This paper contains 31 sections, 16 equations, 12 figures, 13 tables.

Figures (12)

  • Figure 1: Simple MIA on a 2D toy dataset with three classes. The Background color indicates the models' prediction confidence, and the yellow lines show the intermediate optimization steps of the attack. The optimization starts from a random position, here a sample from the green circle class, and tries to reconstruct a sample from the orange pentagons class. The attack against the positive LS model (\ref{['fig:toy_pos_smoothing']}) constructs a sample very close to the targeted training data. In contrast, attacking the negative LS model (\ref{['fig:toy_neg_smoothing']}) saturates close to the decision boundary and far away from the training data.
  • Figure 2: Attack results for FaceScrub models trained with varying numbers of training samples per class (\ref{['fig:num_training_samples_facescrub']}) and different smoothing factors (\ref{['fig:smoothing_factor']}). Results are stated as the relative improvement, denoted as advantage, compared to the model trained with hard labels. While positive LS has larger impact on low-data regimes, negative LS acts as a stronger defense when trained on more samples.
  • Figure 3: Attack samples from the FaceScrub models trained with 30 samples per class. Samples are not cherry-picked but show the most robust attack results based on PPA's selection procedure. The model trained with positive LS ($\alpha=0.1$) clearly reveals more visual characteristics of the target identities, whereas attacks on the negative LS model ($\alpha=-0.05$) generate misleading results.
  • Figure 4: Penultimate activations of training samples from 100 FaceNet classes (colors are reused). Compared to training with hard labels (\ref{['fig:tsne_no_smoothing']}), training with positive LS (\ref{['fig:tsne_pos_smoothing']}) clusters samples from the same class together. Smoothing the labels with a negative factor (\ref{['fig:tsne_neg_smoothing']}) reverses this effect and instead places samples from different classes closer together to build a less clearly separated space.
  • Figure 5: Distribution of maximum-scaled $\ell_2$ feature distances between penultimate layer activations. The left-hand side of each plot depicts the average distance between each training sample and all other samples from the same class (intraclass) and other classes (interclass). The right-hand side of each plot shows the distances to the closest sample. Positive LS reduces the relative intraclass distances while increasing the distance to other samples, whereas negative LS partly reverts this effect and moves some samples from other classes closer to samples of a particular class.
  • ...and 7 more figures