Table of Contents
Fetching ...

VidModEx: Interpretable and Efficient Black Box Model Extraction for High-Dimensional Spaces

Somnath Sendhil Kumar, Yuvaraj Govindarajulu, Pavan Kulkarni, Manojkumar Parmar

TL;DR

VidModEx addresses the scalability gap in black-box model extraction for high-dimensional inputs by integrating SHAP-based explanations into an energy-based GAN framework to guide synthetic data generation. The method employs a class-targeted generator, a substitute model, and a SHAP-valued discriminator to optimize samples that maximize the target class probability while controlling query cost. It demonstrates significant performance gains over prior baselines across image and video classification tasks, including hard-label, soft-label, and top-k scenarios, and remains robust under grey-box surrogate datasets. This approach advances practical black-box extraction capabilities and provides a reproducible pipeline for evaluating extraction in realistic MLaaS settings, with potential implications for security and AI governance.

Abstract

In the domain of black-box model extraction, conventional methods reliant on soft labels or surrogate datasets struggle with scaling to high-dimensional input spaces and managing the complexity of an extensive array of interrelated classes. In this work, we present a novel approach that utilizes SHAP (SHapley Additive exPlanations) to enhance synthetic data generation. SHAP quantifies the individual contributions of each input feature towards the victim model's output, facilitating the optimization of an energy-based GAN towards a desirable output. This method significantly boosts performance, achieving a 16.45% increase in the accuracy of image classification models and extending to video classification models with an average improvement of 26.11% and a maximum of 33.36% on challenging datasets such as UCF11, UCF101, Kinetics 400, Kinetics 600, and Something-Something V2. We further demonstrate the effectiveness and practical utility of our method under various scenarios, including the availability of top-k prediction probabilities, top-k prediction labels, and top-1 labels.

VidModEx: Interpretable and Efficient Black Box Model Extraction for High-Dimensional Spaces

TL;DR

VidModEx addresses the scalability gap in black-box model extraction for high-dimensional inputs by integrating SHAP-based explanations into an energy-based GAN framework to guide synthetic data generation. The method employs a class-targeted generator, a substitute model, and a SHAP-valued discriminator to optimize samples that maximize the target class probability while controlling query cost. It demonstrates significant performance gains over prior baselines across image and video classification tasks, including hard-label, soft-label, and top-k scenarios, and remains robust under grey-box surrogate datasets. This approach advances practical black-box extraction capabilities and provides a reproducible pipeline for evaluating extraction in realistic MLaaS settings, with potential implications for security and AI governance.

Abstract

In the domain of black-box model extraction, conventional methods reliant on soft labels or surrogate datasets struggle with scaling to high-dimensional input spaces and managing the complexity of an extensive array of interrelated classes. In this work, we present a novel approach that utilizes SHAP (SHapley Additive exPlanations) to enhance synthetic data generation. SHAP quantifies the individual contributions of each input feature towards the victim model's output, facilitating the optimization of an energy-based GAN towards a desirable output. This method significantly boosts performance, achieving a 16.45% increase in the accuracy of image classification models and extending to video classification models with an average improvement of 26.11% and a maximum of 33.36% on challenging datasets such as UCF11, UCF101, Kinetics 400, Kinetics 600, and Something-Something V2. We further demonstrate the effectiveness and practical utility of our method under various scenarios, including the availability of top-k prediction probabilities, top-k prediction labels, and top-1 labels.
Paper Structure (24 sections, 2 theorems, 28 equations, 16 figures, 2 tables)

This paper contains 24 sections, 2 theorems, 28 equations, 16 figures, 2 tables.

Key Result

Theorem 1

Only one possible explanation model $g$ follows Definition 1 and satisfies Properties 1, 2 and 3: Where $|z'|$ is the number of non-zero entries in $z'$, and $z' \subseteq x'$ represents all $z'$ vectors where the non-zero entries are a subset of the non-zero entries in $x'$.

Figures (16)

  • Figure 1: Act. Atlas for DFMEtruong2021data
  • Figure 2: Act. Atlas for SHAP
  • Figure 3: Distribution based on Victim model prediction on generated samples for CIFAR 100
  • Figure 4: Model extraction diagram with additional objectives and SHAP explainers
  • Figure 5: Shap values and visualization at each stage of the Pipeline
  • ...and 11 more figures

Theorems & Definitions (4)

  • Definition 1: Additive feature attribution methods
  • Theorem 1
  • Theorem 2: Hierarchical Approximation of SHAP Values
  • Proof 2.1