Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity
Zhihao Zhu, Chenwang Wu, Rui Fan, Yi Yang, Zhen Wang, Defu Lian, Enhong Chen
TL;DR
This work investigates model stealing attacks on graph classification under practical constraints of limited real data and hard-label query outputs. It introduces three strategies—MSA-AU (authenticity and uncertainty), MSA-AD (authenticity and diversity via Mixup), and MSA-AUD (combining both)—to generate informative synthetic samples that enable a clone model to closely replicate a target GNN. Extensive experiments across multiple datasets show that the proposed attacks improve fidelity and query efficiency, and remain effective even under unknown architectures or modest defenses. The results highlight a significant security risk for graph-classification models and suggest directions for robust defenses, including architecture secrecy, query-cost policies, and adversarial training-based detectors.
Abstract
Recent research demonstrates that GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions. However, they mainly focus on node classification tasks, neglecting the potential threats entailed within the domain of graph classification tasks. Furthermore, their practicality is questionable due to unreasonable assumptions, specifically concerning the large data requirements and extensive model knowledge. To this end, we advocate following strict settings with limited real data and hard-label awareness to generate synthetic data, thereby facilitating the stealing of the target model. Specifically, following important data generation principles, we introduce three model stealing attacks to adapt to different actual scenarios: MSA-AU is inspired by active learning and emphasizes the uncertainty to enhance query value of generated samples; MSA-AD introduces diversity based on Mixup augmentation strategy to alleviate the query inefficiency issue caused by over-similar samples generated by MSA-AU; MSA-AUD combines the above two strategies to seamlessly integrate the authenticity, uncertainty, and diversity of the generated samples. Finally, extensive experiments consistently demonstrate the superiority of the proposed methods in terms of concealment, query efficiency, and stealing performance.
