Towards more Practical Threat Models in Artificial Intelligence Security

Kathrin Grosse; Lukas Bieringer; Tarek Richard Besold; Alexandre Alahi

Towards more Practical Threat Models in Artificial Intelligence Security

Kathrin Grosse, Lukas Bieringer, Tarek Richard Besold, Alexandre Alahi

TL;DR

This paper revisit the threat models of the six most studied attacks in AI security research and match them to AI usage in practice via a survey with 271 industrial practitioners, finding that all existing threat models are indeed applicable.

Abstract

Recent works have identified a gap between research and practice in artificial intelligence security: threats studied in academia do not always reflect the practical use and security risks of AI. For example, while models are often studied in isolation, they form part of larger ML pipelines in practice. Recent works also brought forward that adversarial manipulations introduced by academic attacks are impractical. We take a first step towards describing the full extent of this disparity. To this end, we revisit the threat models of the six most studied attacks in AI security research and match them to AI usage in practice via a survey with 271 industrial practitioners. On the one hand, we find that all existing threat models are indeed applicable. On the other hand, there are significant mismatches: research is often too generous with the attacker, assuming access to information not frequently available in real-world settings. Our paper is thus a call for action to study more practical threat models in artificial intelligence security.

Towards more Practical Threat Models in Artificial Intelligence Security

TL;DR

Abstract

Paper Structure (25 sections, 4 figures, 8 tables)

This paper contains 25 sections, 4 figures, 8 tables.

Introduction
Background
Threat Modelling Artificial Intelligence
AI Security
Methodology
Measuring Threat Models in Practice
Questionnaire Design
Pretests and Recruiting
Sample Description
Results
Overall Threat Surface
Attack Specific Threat Models
Training-Time Attacks
Test-Time Attacks
Privacy Attacks
...and 10 more sections

Figures (4)

Figure 1: Backdoor threat model in percent of our participants' replies. We report 3rd party access: White denotes incomplete data or an irrelevant threat model (e.g., only test data accessible). Black represents no access, turquoise the backdoor threat model. Light turquoise denotes insufficient access for backdoors, but sufficient access for poisoning attacks.
Figure 2: Evasion threat models in percent of our participants' replies. We report 3rd party access: White denotes incomplete data or an irrelevant threat model (e.g., only model accessible). Black represents no access, turquoise white-box and light turquoise black-box evasion threat models.
Figure 3: Model stealing threat model in percent of our participants' replies. We describe 3rd party access: White denotes incomplete data or an irrelevant threat model (e.g., only test inputs are accessible). Black represents no access, turquoise denotes the academic threat model, gray that the attack is obsolete as the model is available. Red denotes a rarely studied threat model in current research.
Figure 4: Membership and attribute inference threat models in percent of participants' replies. We describe 3rd party access: White denotes incomplete data or irrelevant threat models, black represents no access, turquoise denotes existing threat models, gray means that the attack is obsolete as the training data is available, too. For membership, red denotes a threat model not studied so far. In the case of attribute inference, turquoise denotes no model access, but the property can directly be inferred from the training data.

Towards more Practical Threat Models in Artificial Intelligence Security

TL;DR

Abstract

Towards more Practical Threat Models in Artificial Intelligence Security

Authors

TL;DR

Abstract

Table of Contents

Figures (4)