Embedding Attack Project (Work Report)

Jiameng Pu; Zafar Takhirov

Embedding Attack Project (Work Report)

Jiameng Pu, Zafar Takhirov

TL;DR

This work examines privacy leakage from embeddings in six classification models by evaluating membership and property inference attacks that operate on intermediate representations. It compares loss-based and embedding-based neural network MIAs, showing that deeper embedding layers and higher overfitting increase leakage, while NLP models tend to be more resistant. The study also explores pseudo-label strategies and potential label-inference attacks, highlighting that reducing MIA success does not necessarily reduce PIA risk. Ongoing efforts include neighborhood-comparison attacks and defense evaluations such as Noisy Self-Distillation, underscoring the practical privacy challenges in embedding-rich ML deployments.

Abstract

This report summarizes all the MIA experiments (Membership Inference Attacks) of the Embedding Attack Project, including threat models, experimental setup, experimental results, findings and discussion. Current results cover the evaluation of two main MIA strategies (loss-based and embedding-based MIAs) on 6 AI models ranging from Computer Vision to Language Modelling. There are two ongoing experiments on MIA defense and neighborhood-comparison embedding attacks. These are ongoing projects. The current work on MIA and PIA can be summarized into six conclusions: (1) Amount of overfitting is directly proportional to model's vulnerability; (2) early embedding layers in the model are less susceptible to privacy leaks; (3) Deeper model layers contain more membership information; (4) Models are more vulnerable to MIA if both embeddings and corresponding training labels are compromised; (5) it is possible to use pseudo-labels to increase the MIA success; and (6) although MIA and PIA success rates are proportional, reducing the MIA does not necessarily reduce the PIA.

Embedding Attack Project (Work Report)

TL;DR

Abstract

Paper Structure (33 sections, 1 equation, 4 figures, 11 tables, 1 algorithm)

This paper contains 33 sections, 1 equation, 4 figures, 11 tables, 1 algorithm.

Introduction
Defining Embedding Layers and Embeddings
The Importance of Investigating Privacy Leakage in Embedding Layers
Privacy Perspectives in this Investigation
Real-World Implications
Threat Models
Membership Inference Attack
Targeted Models
Attacker's Capabilities and Goals
Property Inference Attack
Targeted Models
Attacker's Capabilities and Goals
Experimental Setup
Models and Datasets
Image Classification Models
...and 18 more sections

Figures (4)

Figure 1: Demonstration of embeddings from shallow to deep within an ML model.
Figure 2: The attack strategy of three neural-network-based MIAs
Figure 3: The attack strategy of neural-network-based PIAs
Figure 4: Inverting lower layer representations and embedding of BERT.

Embedding Attack Project (Work Report)

TL;DR

Abstract

Embedding Attack Project (Work Report)

Authors

TL;DR

Abstract

Table of Contents

Figures (4)