Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction

Shuchi Wu; Chuan Ma; Kang Wei; Xiaogang Xu; Ming Ding; Yuwen Qian; Tao Xiang

Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction

Shuchi Wu, Chuan Ma, Kang Wei, Xiaogang Xu, Ming Ding, Yuwen Qian, Tao Xiang

TL;DR

The paper addresses the problem of stealing pre-trained SSL encoders with minimal query costs by introducing RDA, which refines target representations per sample through sample-wise prototypes and trains a surrogate encoder using a two-part multi-relational extraction loss. The framework leverages a memory bank of prototypes p_{x_i} = \frac{1}{n}\sum_{c=1}^n E_T(\boldsymbol{x}'_{i,t,c}) and a loss L = \lambda_1 L_D + \lambda_2 L_A, where L_D discriminates mismatched pairs while L_A aligns embeddings in amplitude and angle via logarithmic penalties. Empirical results show RDA achieves state-of-the-art stealing efficacy across seven downstream datasets, with substantially lower query costs (e.g., ~1\% of Cont-Steal) and strong robustness to defenses, including perturbations and watermarking, across medium and large-scale encoders. The work has implications for both adversarial use and defense development, demonstrating practical risks in SSL service APIs and motivating defense strategies against query-based encroachment.

Abstract

This paper introduces RDA, a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders: (1) suboptimal performances attributed to biased optimization objectives, and (2) elevated query costs stemming from the end-to-end paradigm that necessitates querying the target encoder every epoch. Specifically, we initially Refine the representations of the target encoder for each training sample, thereby establishing a less biased optimization objective before the steal-training phase. This is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives. Demanding exponentially fewer queries compared to the end-to-end approach, prototypes can be instantiated to guide subsequent query-free training. For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs while Aligning those matched ones in terms of both amplitude and angle. In this way, the trained surrogate encoder achieves state-of-the-art results across the board in various downstream datasets with limited queries. Moreover, RDA is shown to be robust to multiple widely-used defenses.

Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction

TL;DR

Abstract

Paper Structure (42 sections, 14 equations, 12 figures, 15 tables, 1 algorithm)

This paper contains 42 sections, 14 equations, 12 figures, 15 tables, 1 algorithm.

Introduction
Related Work
Methodology
Threat Model
Sample-Wise Prototypes
Multi-Relational Extraction Loss
Experiments
Experimental Setup
Effectiveness of RDA
Stealing Medium-Scale Encoders.
Stealing Real-World Large-Scale Encoders.
Comparision with Existing Methods
Under the Same Surrogate Dataset Size.
Under the Same Query Cost.
Under the Same Time Cost.
...and 27 more sections

Figures (12)

Figure 1: Illustrations of four stealing methods against SSL. The dotted arrows and text beside interpret how each method optimizes the surrogate encoder. Surrogate encoder branches in (b)-(d) involve data augmentations for training. Both (c) and (d) augment each sample before querying the target encoder but adopting different schemes.
Figure 2: t-SNE of embeddings belonging to five different images generated by an encoder pre-trained on CIFAR10, with each image augmented into 500 patches and fed into the encoder. Each black marker represents the mean of the 500 embeddings of a certain image, i.e., its prototype. Among the embeddings of an image's various augmentations, some can be diverged or even biased. In contrast, each image's prototype is more distinguishable, i.e., less biased.
Figure 3: Performance comparisons between four stealing methods against SSL. The presented results are the mean values achieved by each method over seven different downstream classification tasks, with their corresponding query costs. Our proposed RDA can achieve SOTA results with the least query cost.
Figure 4: Pipeline of RDA. Prototype generation: augment one sample into $n$ patches and use them to query the target encoder ($E_T$). The mean of the $n$ patches's embeddings is defined as a prototype for this sample. Forward encoding: crop one image into $m$ patches and feed them to the surrogate encoder ($E_S$) for their embeddings. Optimization: align embeddings from the surrogate encoder to their matched prototypes in both angle and amplitude while pushing away those belonging to different samples.
Figure 5: t-SNE of embeddings of 2,000 images sampled from CIFAR10 generated by the ImageNet encoder and CLIP.
...and 7 more figures

Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction

TL;DR

Abstract

Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction

Authors

TL;DR

Abstract

Table of Contents

Figures (12)