Table of Contents
Fetching ...

DREAM: Domain-agnostic Reverse Engineering Attributes of Black-box Model

Rongqing Li, Jiaqi Yu, Changsheng Li, Wenhan Luo, Ye Yuan, Guoren Wang

TL;DR

This work addresses the practical challenge of reverse engineering attributes of black-box models without access to the target model's training data. It introduces DREAM, a framework that reframes attribute inference as an OOD generalization problem and leverages a multi-discriminator GAN to learn domain-invariant features from probability outputs, followed by a domain-agnostic reverse meta-model to predict attributes. Empirical results on PACS and MEDU modelsets show DREAM outperforms baselines across CNN and ViT attribute spaces, including domain-shift and larger attribute scenarios, and demonstrate the approach's potential for model extraction and security analyses. The study highlights both the feasibility of domain-agnostic reverse engineering in practical MLaaS settings and the need to consider defenses against attribute leakage.

Abstract

Deep learning models are usually black boxes when deployed on machine learning platforms. Prior works have shown that the attributes (e.g., the number of convolutional layers) of a target black-box model can be exposed through a sequence of queries. There is a crucial limitation: these works assume the training dataset of the target model is known beforehand and leverage this dataset for model attribute attack. However, it is difficult to access the training dataset of the target black-box model in reality. Therefore, whether the attributes of a target black-box model could be still revealed in this case is doubtful. In this paper, we investigate a new problem of black-box reverse engineering, without requiring the availability of the target model's training dataset. We put forward a general and principled framework DREAM, by casting this problem as out-of-distribution (OOD) generalization. In this way, we can learn a domain-agnostic meta-model to infer the attributes of the target black-box model with unknown training data. This makes our method one of the kinds that can gracefully apply to an arbitrary domain for model attribute reverse engineering with strong generalization ability. Extensive experimental results demonstrate the superiority of our proposed method over the baselines.

DREAM: Domain-agnostic Reverse Engineering Attributes of Black-box Model

TL;DR

This work addresses the practical challenge of reverse engineering attributes of black-box models without access to the target model's training data. It introduces DREAM, a framework that reframes attribute inference as an OOD generalization problem and leverages a multi-discriminator GAN to learn domain-invariant features from probability outputs, followed by a domain-agnostic reverse meta-model to predict attributes. Empirical results on PACS and MEDU modelsets show DREAM outperforms baselines across CNN and ViT attribute spaces, including domain-shift and larger attribute scenarios, and demonstrate the approach's potential for model extraction and security analyses. The study highlights both the feasibility of domain-agnostic reverse engineering in practical MLaaS settings and the need to consider defenses against attribute leakage.

Abstract

Deep learning models are usually black boxes when deployed on machine learning platforms. Prior works have shown that the attributes (e.g., the number of convolutional layers) of a target black-box model can be exposed through a sequence of queries. There is a crucial limitation: these works assume the training dataset of the target model is known beforehand and leverage this dataset for model attribute attack. However, it is difficult to access the training dataset of the target black-box model in reality. Therefore, whether the attributes of a target black-box model could be still revealed in this case is doubtful. In this paper, we investigate a new problem of black-box reverse engineering, without requiring the availability of the target model's training dataset. We put forward a general and principled framework DREAM, by casting this problem as out-of-distribution (OOD) generalization. In this way, we can learn a domain-agnostic meta-model to infer the attributes of the target black-box model with unknown training data. This makes our method one of the kinds that can gracefully apply to an arbitrary domain for model attribute reverse engineering with strong generalization ability. Extensive experimental results demonstrate the superiority of our proposed method over the baselines.

Paper Structure

This paper contains 22 sections, 4 equations, 7 figures, 11 tables.

Figures (7)

  • Figure 1: Previous work (left) assumes the dataset used to train the target black-box model is known beforehand, and requires to use the same dataset to train white-box models. Our DREAM framework (right) relaxes the condition that training data of the target black-box model is no longer required to be available. Our idea is to cast the task of the black-box model attribute inference into an OOD learning problem.
  • Figure 2: The performance of KENNENoh2018towards2 on black-box model trained on Cartoon, Sketch and Photo dataset li2017deeper. The training set of white-box models is Cartoon.
  • Figure 3: An illustration of our DREAM framework. In the left part, we train a large number of white-box models using datasets collected from different styles (cartoon, photo, and sketch) to construct modelset. Models in the modelset consist of numerous combinations of attributes. Then, we sample queries from each style of dataset and input them into each white-box model to obtain the multi-domain model's outputs $O$. In the right part, we propose a multi-discriminator GAN to learn domain-invariant features from the outputs of the white-box models. After that, the domain-agnostic reverse meta-model is trained based on these domain-invariant features. During the inference stage, queries are sent to the black-box model to obtain its outputs. Then, the Generator produces domain-invariant features, which are input to the domain-agnostic meta-model to infer the attributes of the black-box model.
  • Figure 4: An example to illustrate the MDGAN.
  • Figure 5: T-SNE visualization of features of different domains produced by DREAM, MMD, MisStyle and SelfReg on PACS modelset.
  • ...and 2 more figures