Rank Matters: Understanding and Defending Model Inversion Attacks via Low-Rank Feature Filtering
Hongyao Yu, Yixiang Qiu, Hao Fang, Tianqu Zhuang, Bin Chen, Sijin Yu, Bin Wang, Shu-Tao Xia, Ke Xu
TL;DR
This work tackles privacy leakage from Model Inversion Attacks by introducing the Ideal Inversion Error (IIE) and linking leakage to the rank of intermediate representations. It proposes LoFt, a low-rank feature filtering defense that decomposes the classification head into two layers to enforce a reduced effective rank, coupled with a tanh activation to induce gradient vanishing and further thwart inversion. Theoretical analysis shows IIE scales inversely with rank, and extensive experiments across multiple datasets and architectures demonstrate LoFt achieving state-of-the-art defense performance, notably in high-resolution and high-capacity settings where prior defenses fail. The approach maintains strong task utility while significantly elevating privacy protection, supported by ablations and robustness evaluations.
Abstract
Model Inversion Attacks (MIAs) pose a significant threat to data privacy by reconstructing sensitive training samples from the knowledge embedded in trained machine learning models. Despite recent progress in enhancing the effectiveness of MIAs across diverse settings, defense strategies have lagged behind, struggling to balance model utility with robustness against increasingly sophisticated attacks. In this work, we propose the ideal inversion error to measure the privacy leakage, and our theoretical and empirical investigations reveals that higher-rank features are inherently more prone to privacy leakage. Motivated by this insight, we propose a lightweight and effective defense strategy based on low-rank feature filtering, which explicitly reduces the attack surface by constraining the dimension of intermediate representations. Extensive experiments across various model architectures and datasets demonstrate that our method consistently outperforms existing defenses, achieving state-of-the-art performance against a wide range of MIAs. Notably, our approach remains effective even in challenging regimes involving high-resolution data and high-capacity models, where prior defenses fail to provide adequate protection. The code is available at https://github.com/Chrisqcwx/LoFt .
