Attackers Can Do Better: Over- and Understated Factors of Model Stealing Attacks
Daryna Oliynyk, Rudolf Mayer, Andreas Rauber
TL;DR
The paper tackles model stealing by exhaustively analyzing how attacker strength factors influence substitute-model fidelity. Through 180 configurations across three CIFAR-10 target models, it demonstrates that target performance bottlenecks fidelity, data complexity and transfer-learning choices crucially shape attack effectiveness, and data-free approaches can be highly efficient. A data-free attack with around 10k queries matches or surpasses prior state-of-the-art methods that require orders of magnitude more queries, especially when leveraging query optimization. The discussion on transferability and defenses reveals limited cross-model transfer and inadequate protections for label-only outputs, highlighting the need for novel defense mechanisms and a shift in how defenders assess and mitigate model-stealing threats.
Abstract
Machine learning models were shown to be vulnerable to model stealing attacks, which lead to intellectual property infringement. Among other methods, substitute model training is an all-encompassing attack applicable to any machine learning model whose behaviour can be approximated from input-output queries. Whereas prior works mainly focused on improving the performance of substitute models by, e.g. developing a new substitute training method, there have been only limited ablation studies on the impact the attacker's strength has on the substitute model's performance. As a result, different authors came to diverse, sometimes contradicting, conclusions. In this work, we exhaustively examine the ambivalent influence of different factors resulting from varying the attacker's capabilities and knowledge on a substitute training attack. Our findings suggest that some of the factors that have been considered important in the past are, in fact, not that influential; instead, we discover new correlations between attack conditions and success rate. In particular, we demonstrate that better-performing target models enable higher-fidelity attacks and explain the intuition behind this phenomenon. Further, we propose to shift the focus from the complexity of target models toward the complexity of their learning tasks. Therefore, for the substitute model, rather than aiming for a higher architecture complexity, we suggest focusing on getting data of higher complexity and an appropriate architecture. Finally, we demonstrate that even in the most limited data-free scenario, there is no need to overcompensate weak knowledge with millions of queries. Our results often exceed or match the performance of previous attacks that assume a stronger attacker, suggesting that these stronger attacks are likely endangering a model owner's intellectual property to a significantly higher degree than shown until now.
