Table of Contents
Fetching ...

Under manipulations, are some AI models harder to audit?

Augustin Godinot, Gilles Tredan, Erwan Le Merrer, Camilla Penzo, Francois Taïani

TL;DR

The feasibility of robust audits in realistic settings, in which models exhibit large capacities is studied, to refine the limits of the auditing problem, and open up enticing questions on the connection between model capacity and the ability of platforms to manipulate audit attempts.

Abstract

Auditors need robust methods to assess the compliance of web platforms with the law. However, since they hardly ever have access to the algorithm, implementation, or training data used by a platform, the problem is harder than a simple metric estimation. Within the recent framework of manipulation-proof auditing, we study in this paper the feasibility of robust audits in realistic settings, in which models exhibit large capacities. We first prove a constraining result: if a web platform uses models that may fit any data, no audit strategy -- whether active or not -- can outperform random sampling when estimating properties such as demographic parity. To better understand the conditions under which state-of-the-art auditing techniques may remain competitive, we then relate the manipulability of audits to the capacity of the targeted models, using the Rademacher complexity. We empirically validate these results on popular models of increasing capacities, thus confirming experimentally that large-capacity models, which are commonly used in practice, are particularly hard to audit robustly. These results refine the limits of the auditing problem, and open up enticing questions on the connection between model capacity and the ability of platforms to manipulate audit attempts.

Under manipulations, are some AI models harder to audit?

TL;DR

The feasibility of robust audits in realistic settings, in which models exhibit large capacities is studied, to refine the limits of the auditing problem, and open up enticing questions on the connection between model capacity and the ability of platforms to manipulate audit attempts.

Abstract

Auditors need robust methods to assess the compliance of web platforms with the law. However, since they hardly ever have access to the algorithm, implementation, or training data used by a platform, the problem is harder than a simple metric estimation. Within the recent framework of manipulation-proof auditing, we study in this paper the feasibility of robust audits in realistic settings, in which models exhibit large capacities. We first prove a constraining result: if a web platform uses models that may fit any data, no audit strategy -- whether active or not -- can outperform random sampling when estimating properties such as demographic parity. To better understand the conditions under which state-of-the-art auditing techniques may remain competitive, we then relate the manipulability of audits to the capacity of the targeted models, using the Rademacher complexity. We empirically validate these results on popular models of increasing capacities, thus confirming experimentally that large-capacity models, which are commonly used in practice, are particularly hard to audit robustly. These results refine the limits of the auditing problem, and open up enticing questions on the connection between model capacity and the ability of platforms to manipulate audit attempts.
Paper Structure (29 sections, 40 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 29 sections, 40 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: Security game of the manipulation-proof auditing framework. Before the audit, the platform declares the hypothesis space $\mathcal{H}$ to the auditor. During the audit, the platform serves the model $h \in \mathcal{H}$ and the auditor queries $h$ on $S$. After the audit, the platform can change its model to $h^\prime$ with the constraint that $\forall x \in S, h^\prime(x) = h(x)$ or equivalently, $h^\prime \in \mathcal{H} (h, S)$.
  • Figure 2: The diameter (vertical axis) resulting from the amount of memory (horizontal axis) of the dictionary model studied in subsection \ref{['subsec:dictionaries']}. The various audit budgets are represented by different curve colors, while the optimal audit set appears as dashed curves, and the random baseline audit sets as plain lines.
  • Figure 3: Distribution of the capacity (horizontal axis) for different hyperparameters choices on the three datasets (vertical axis). Each model is trained with different hyperparameter values with each couple (model, hyperparameter) representing a different hypothesis class $\mathcal{H}$. For each (model, hyperparameter) couple, the empirical Rademacher values $R_m (\mathcal{H} \circ D)$ are averaged over $15$ realizations of $D$ and $\sigma_i$ before computing the model capacity.
  • Figure 4: Distribution of the $\text{Manipulability}\xspace$ (manipulability under random audits) values (horizontal axis) of different models $\mathcal{F}$ on a selection of datasets (vertical axis). Each bar represents a different model $\mathcal{F}$ (trees, linear models, ...). Each model is trained with different hyperparameter values with each couple (model, hyperparameter) representing a different hypothesis class $\mathcal{H}$. For each dataset, the size of the audit set is set to $10\%$ of the dataset size: $\left| S \right| = 0.1 \left| \mathcal{X} \right|$. For each (model, hyperparameter) couple, the $\mu$-diameter are averaged over $15$ audit datasets before computing the manipulability.
  • Figure 5: Distribution of the manipulability under random audits values (vertical axis) of different models versus their capacity (horizontal axis) on a selection of datasets. Each point represents a couple (model, hyperparameter). For each dataset, the size of the audit set is set to $10\%$ of the dataset size: $\left| S \right| = 0.1 \left| \mathcal{X} \right|$. For each (model, hyperparameter) couple, the Manipulability is averaged over $15$ audit datasets, and the capacity is computed over $30$ randomizations of the dataset labels. The error bars represent the standard deviation.
  • ...and 4 more figures

Theorems & Definitions (5)

  • proof
  • Definition 1: Dictionary hypothesis class
  • proof
  • Definition 2: Benign Overfitting Hypothesis class
  • proof