Table of Contents
Fetching ...

Rank Matters: Understanding and Defending Model Inversion Attacks via Low-Rank Feature Filtering

Hongyao Yu, Yixiang Qiu, Hao Fang, Tianqu Zhuang, Bin Chen, Sijin Yu, Bin Wang, Shu-Tao Xia, Ke Xu

TL;DR

This work tackles privacy leakage from Model Inversion Attacks by introducing the Ideal Inversion Error (IIE) and linking leakage to the rank of intermediate representations. It proposes LoFt, a low-rank feature filtering defense that decomposes the classification head into two layers to enforce a reduced effective rank, coupled with a tanh activation to induce gradient vanishing and further thwart inversion. Theoretical analysis shows IIE scales inversely with rank, and extensive experiments across multiple datasets and architectures demonstrate LoFt achieving state-of-the-art defense performance, notably in high-resolution and high-capacity settings where prior defenses fail. The approach maintains strong task utility while significantly elevating privacy protection, supported by ablations and robustness evaluations.

Abstract

Model Inversion Attacks (MIAs) pose a significant threat to data privacy by reconstructing sensitive training samples from the knowledge embedded in trained machine learning models. Despite recent progress in enhancing the effectiveness of MIAs across diverse settings, defense strategies have lagged behind, struggling to balance model utility with robustness against increasingly sophisticated attacks. In this work, we propose the ideal inversion error to measure the privacy leakage, and our theoretical and empirical investigations reveals that higher-rank features are inherently more prone to privacy leakage. Motivated by this insight, we propose a lightweight and effective defense strategy based on low-rank feature filtering, which explicitly reduces the attack surface by constraining the dimension of intermediate representations. Extensive experiments across various model architectures and datasets demonstrate that our method consistently outperforms existing defenses, achieving state-of-the-art performance against a wide range of MIAs. Notably, our approach remains effective even in challenging regimes involving high-resolution data and high-capacity models, where prior defenses fail to provide adequate protection. The code is available at https://github.com/Chrisqcwx/LoFt .

Rank Matters: Understanding and Defending Model Inversion Attacks via Low-Rank Feature Filtering

TL;DR

This work tackles privacy leakage from Model Inversion Attacks by introducing the Ideal Inversion Error (IIE) and linking leakage to the rank of intermediate representations. It proposes LoFt, a low-rank feature filtering defense that decomposes the classification head into two layers to enforce a reduced effective rank, coupled with a tanh activation to induce gradient vanishing and further thwart inversion. Theoretical analysis shows IIE scales inversely with rank, and extensive experiments across multiple datasets and architectures demonstrate LoFt achieving state-of-the-art defense performance, notably in high-resolution and high-capacity settings where prior defenses fail. The approach maintains strong task utility while significantly elevating privacy protection, supported by ablations and robustness evaluations.

Abstract

Model Inversion Attacks (MIAs) pose a significant threat to data privacy by reconstructing sensitive training samples from the knowledge embedded in trained machine learning models. Despite recent progress in enhancing the effectiveness of MIAs across diverse settings, defense strategies have lagged behind, struggling to balance model utility with robustness against increasingly sophisticated attacks. In this work, we propose the ideal inversion error to measure the privacy leakage, and our theoretical and empirical investigations reveals that higher-rank features are inherently more prone to privacy leakage. Motivated by this insight, we propose a lightweight and effective defense strategy based on low-rank feature filtering, which explicitly reduces the attack surface by constraining the dimension of intermediate representations. Extensive experiments across various model architectures and datasets demonstrate that our method consistently outperforms existing defenses, achieving state-of-the-art performance against a wide range of MIAs. Notably, our approach remains effective even in challenging regimes involving high-resolution data and high-capacity models, where prior defenses fail to provide adequate protection. The code is available at https://github.com/Chrisqcwx/LoFt .
Paper Structure (38 sections, 1 theorem, 13 equations, 9 figures, 19 tables)

This paper contains 38 sections, 1 theorem, 13 equations, 9 figures, 19 tables.

Key Result

Theorem 1

Under the above assumptions, the Ideal Inversion Error of the linear model $f_\theta(\bm{x})=\bm{Wx}$ is given by

Figures (9)

  • Figure 1: Model performance on the test dataset and attack accuracy with difference defenses in the high-resolution scenarios.
  • Figure 2: Overview of our LoFt defense strategy
  • Figure 3: Experimental results with different rank $r$ via SVD. The red lines means test and IF attack accuracy with different compressed rank. The blue line indicates the retention ratio of the eigenvalue.
  • Figure 4: Simple MIA on a 2D toy dataset with three classes against different model architectures.
  • Figure 5: Visual comparison of IF attacks against ResNet-$152$ under different defense strategies.
  • ...and 4 more figures

Theorems & Definitions (5)

  • Definition 1: Ideal Inversion Attacker, IIA
  • Remark 1
  • Definition 2: Ideal Inversion Error, IIE
  • Remark 2
  • Theorem 1