Refine, Discriminate and Align: Stealing Encoders via Sample-Wise Prototypes and Multi-Relational Extraction
Shuchi Wu, Chuan Ma, Kang Wei, Xiaogang Xu, Ming Ding, Yuwen Qian, Tao Xiang
TL;DR
The paper addresses the problem of stealing pre-trained SSL encoders with minimal query costs by introducing RDA, which refines target representations per sample through sample-wise prototypes and trains a surrogate encoder using a two-part multi-relational extraction loss. The framework leverages a memory bank of prototypes p_{x_i} = \frac{1}{n}\sum_{c=1}^n E_T(\boldsymbol{x}'_{i,t,c}) and a loss L = \lambda_1 L_D + \lambda_2 L_A, where L_D discriminates mismatched pairs while L_A aligns embeddings in amplitude and angle via logarithmic penalties. Empirical results show RDA achieves state-of-the-art stealing efficacy across seven downstream datasets, with substantially lower query costs (e.g., ~1\% of Cont-Steal) and strong robustness to defenses, including perturbations and watermarking, across medium and large-scale encoders. The work has implications for both adversarial use and defense development, demonstrating practical risks in SSL service APIs and motivating defense strategies against query-based encroachment.
Abstract
This paper introduces RDA, a pioneering approach designed to address two primary deficiencies prevalent in previous endeavors aiming at stealing pre-trained encoders: (1) suboptimal performances attributed to biased optimization objectives, and (2) elevated query costs stemming from the end-to-end paradigm that necessitates querying the target encoder every epoch. Specifically, we initially Refine the representations of the target encoder for each training sample, thereby establishing a less biased optimization objective before the steal-training phase. This is accomplished via a sample-wise prototype, which consolidates the target encoder's representations for a given sample's various perspectives. Demanding exponentially fewer queries compared to the end-to-end approach, prototypes can be instantiated to guide subsequent query-free training. For more potent efficacy, we develop a multi-relational extraction loss that trains the surrogate encoder to Discriminate mismatched embedding-prototype pairs while Aligning those matched ones in terms of both amplitude and angle. In this way, the trained surrogate encoder achieves state-of-the-art results across the board in various downstream datasets with limited queries. Moreover, RDA is shown to be robust to multiple widely-used defenses.
