Model Inversion Robustness: Can Transfer Learning Help?

Sy-Tuyen Ho; Koh Jun Hao; Keshigeyan Chandrasegaran; Ngoc-Bao Nguyen; Ngai-Man Cheung

Model Inversion Robustness: Can Transfer Learning Help?

Sy-Tuyen Ho, Koh Jun Hao, Keshigeyan Chandrasegaran, Ngoc-Bao Nguyen, Ngai-Man Cheung

TL;DR

This work introduces TL-DMI, a simple, transfer-learning–based defense against model inversion that restricts private-data leakage by freezing early layers and only fine-tuning the last few layers ($| heta_C|$) during private training. A novel Fisher Information analysis shows that early layers are more critical for MI reconstruction, while later layers align with the classification task, justifying the design. Empirical results across 20 MI setups, 9 architectures, and multiple attacks demonstrate state-of-the-art MI robustness with modest utility loss, and TL-DMI can be combined with existing defenses like BiDO for further gains. The method is architecture-agnostic, easy to implement, and broadly applicable to both CNNs and vision transformers, highlighting a practical path toward privacy-preserving model deployment without heavy regularization trade-offs.

Abstract

Model Inversion (MI) attacks aim to reconstruct private training data by abusing access to machine learning models. Contemporary MI attacks have achieved impressive attack performance, posing serious threats to privacy. Meanwhile, all existing MI defense methods rely on regularization that is in direct conflict with the training objective, resulting in noticeable degradation in model utility. In this work, we take a different perspective, and propose a novel and simple Transfer Learning-based Defense against Model Inversion (TL-DMI) to render MI-robust models. Particularly, by leveraging TL, we limit the number of layers encoding sensitive information from private training dataset, thereby degrading the performance of MI attack. We conduct an analysis using Fisher Information to justify our method. Our defense is remarkably simple to implement. Without bells and whistles, we show in extensive experiments that TL-DMI achieves state-of-the-art (SOTA) MI robustness. Our code, pre-trained models, demo and inverted data are available at: https://hosytuyen.github.io/projects/TL-DMI

Model Inversion Robustness: Can Transfer Learning Help?

TL;DR

) during private training. A novel Fisher Information analysis shows that early layers are more critical for MI reconstruction, while later layers align with the classification task, justifying the design. Empirical results across 20 MI setups, 9 architectures, and multiple attacks demonstrate state-of-the-art MI robustness with modest utility loss, and TL-DMI can be combined with existing defenses like BiDO for further gains. The method is architecture-agnostic, easy to implement, and broadly applicable to both CNNs and vision transformers, highlighting a practical path toward privacy-preserving model deployment without heavy regularization trade-offs.

Abstract

Paper Structure (33 sections, 4 equations, 10 figures, 18 tables)

This paper contains 33 sections, 4 equations, 10 figures, 18 tables.

Introduction
Background
Transfer Learning-based Defense against Model Inversion (TL-DMI)
Exploring MI Robustness via Transfer Learning
Experimental Setup
Analysis of Layer Importance for Classification Task and MI Task
Empirical Validation
Comparison with SOTA MI Defense
Extended MI Robustness Evaluation
Conclusion
Additional Results
Additional result on BREPMI
Additional Empirical Validation on GMI
Additional result on LOMMA
Additional result on Stanford Cars dataset
...and 18 more sections

Figures (10)

Figure 1: (I) Our proposed Transfer Learning-based Defense against Model Inversion (TL-DMI) (Sec. \ref{['Proposed Method']}). Based on standard TL framework with pre-training (on public dataset) followed by fine-tuning (on private dataset), we propose a simple and highly-effective method to defend against MI attacks. Our idea is to limit fine-tuning with private dataset to a specific number of layers, thereby limiting the encoding of private information to these layers only (pink). Specifically, we propose to perform fine-tuning only on the last several layers. (II) Analysis of layer importance for classification task and MI task (Sec. \ref{['FI']}). For the first time, we analyze importance of target model layers for MI. For a model trained with conventional training, we apply FI and find that the first few layers of the model are important for MI. Meanwhile, FI analysis suggests that last several layers are important for a specific classification task, consistent with TL literature yosinski2014transferable. This supports our hypothesis that preventing the fine-tuning of the first few layers on private dataset could degrade MI significantly, while such impact for classification could be small. Overall, this leads to improved MI robustness. (III) Empirical validation (Sec. \ref{['Empirical Evidence']}). The sub-figures clearly show that at the same natural accuracy, lower MI attack accuracy can be achieved by reducing the number of parameters fine-tuned with private dataset. (IV) Comparison with SOTA MI Defense (Sec. \ref{['Comparison with SOTA']}). Without bells and whistles, our method achieves SOTA in MI robustness. Visual quality of MI-reconstructed images from our model is inferior. User study confirms this finding. Extensive experiments can be found in Sec. \ref{['Extensive setups']}. Best viewed in color with zooming in.
Figure 2: Empirical Validation on VGG16 with GMI. Each line represents one training setup for $T$ with a different $|\theta_C|$ updated on $\mathcal{D}_{priv}$. Note that number of parameters for the entire target model $|\theta_T| = 16.8$M for this MI setup. To separate the influence of natural accuracy on MI attack accuracy, we perform GMI attacks on different checkpoints for each training setup, varying a wide range of natural accuracy. This is presented by multiple data points on each line. For a given natural accuracy, it can be clearly observed that attack accuracy can be reduced by decreasing $|\theta_C|$, i.e., decreasing parameters updated on $\mathcal{D}_{priv}$.
Figure 3: The effect of different $\mathcal{D}_{pretrain}$, i.e., ImageNet1K, Pubfig83, and Facescrub. We use $T$ = VGG16, $\mathcal{D}_{priv}$ = CelebA. The results suggest that the less similarity between pretrain and private dataset domains can improve defense effectiveness.
Figure 4: We follow KEDMI-VGG16 and PPA-ResNet-18 setups in Fig. 1-II. Fine-tuning first layers (green line), rather than middle layers (orange line), enhances MI attack accuracy, corroborating our analysis: first layers are important for MI.
Figure 5: FI distributions across layers during all MI steps. We conduct FI analysis on the main setup in Peng et al peng2022bilateral where the MI attack is KEDMI chen2021knowledge, $T$=VGG16, $\mathcal{D}_{priv}$=CelebA and $\mathcal{D}_{pub}$=CelebA. In the main manuscript, we present the FI analysis at the last MI iteration, i.e., iteration 3000. This figures present a more comprehensive FI analysis across multiple iterations. After first few iterations, we consistently observe that the earlier layers are more important to MI task.
...and 5 more figures

Model Inversion Robustness: Can Transfer Learning Help?

TL;DR

Abstract

Model Inversion Robustness: Can Transfer Learning Help?

Authors

TL;DR

Abstract

Table of Contents

Figures (10)