Table of Contents
Fetching ...

Bayesian Strategic Classification

Lee Cohen, Saeed Sharifi-Malvajerdi, Kevin Stangl, Ali Vakilian, Juba Ziani

TL;DR

This work advances strategic classification by incorporating partial knowledge and partial information release. It models agents with a prior over classifiers and a learner that truthfully reveals a subset of hypotheses, forming a Stackelberg game whose equilibrium hinges on both BR calculations and information design. The paper proves hardness in general, yet offers oracle-efficient BR algorithms for low-dimensional linear and $V$-submodular cost settings, and provides both continuous and discrete uniform-prior algorithms for the learner's information-release problem, plus insights into minimizing false positives/negatives. Collectively, the results illuminate how carefully chosen partial information can boost predictive accuracy and offer practical tools for robust decision-making under manipulation in high-stakes domains.

Abstract

In strategic classification, agents modify their features, at a cost, to ideally obtain a positive classification from the learner's classifier. The typical response of the learner is to carefully modify their classifier to be robust to such strategic behavior. When reasoning about agent manipulations, most papers that study strategic classification rely on the following strong assumption: agents fully know the exact parameters of the deployed classifier by the learner. This often is an unrealistic assumption when using complex or proprietary machine learning techniques in real-world prediction tasks. We initiate the study of partial information release by the learner in strategic classification. We move away from the traditional assumption that agents have full knowledge of the classifier. Instead, we consider agents that have a common distributional prior on which classifier the learner is using. The learner in our model can reveal truthful, yet not necessarily complete, information about the deployed classifier to the agents. The learner's goal is to release just enough information about the classifier to maximize accuracy. We show how such partial information release can, counter-intuitively, benefit the learner's accuracy, despite increasing agents' abilities to manipulate. We show that while it is intractable to compute the best response of an agent in the general case, there exist oracle-efficient algorithms that can solve the best response of the agents when the learner's hypothesis class is the class of linear classifiers, or when the agents' cost function satisfies a natural notion of submodularity as we define. We then turn our attention to the learner's optimization problem and provide both positive and negative results on the algorithmic problem of how much information the learner should release about the classifier to maximize their expected accuracy.

Bayesian Strategic Classification

TL;DR

This work advances strategic classification by incorporating partial knowledge and partial information release. It models agents with a prior over classifiers and a learner that truthfully reveals a subset of hypotheses, forming a Stackelberg game whose equilibrium hinges on both BR calculations and information design. The paper proves hardness in general, yet offers oracle-efficient BR algorithms for low-dimensional linear and -submodular cost settings, and provides both continuous and discrete uniform-prior algorithms for the learner's information-release problem, plus insights into minimizing false positives/negatives. Collectively, the results illuminate how carefully chosen partial information can boost predictive accuracy and offer practical tools for robust decision-making under manipulation in high-stakes domains.

Abstract

In strategic classification, agents modify their features, at a cost, to ideally obtain a positive classification from the learner's classifier. The typical response of the learner is to carefully modify their classifier to be robust to such strategic behavior. When reasoning about agent manipulations, most papers that study strategic classification rely on the following strong assumption: agents fully know the exact parameters of the deployed classifier by the learner. This often is an unrealistic assumption when using complex or proprietary machine learning techniques in real-world prediction tasks. We initiate the study of partial information release by the learner in strategic classification. We move away from the traditional assumption that agents have full knowledge of the classifier. Instead, we consider agents that have a common distributional prior on which classifier the learner is using. The learner in our model can reveal truthful, yet not necessarily complete, information about the deployed classifier to the agents. The learner's goal is to release just enough information about the classifier to maximize accuracy. We show how such partial information release can, counter-intuitively, benefit the learner's accuracy, despite increasing agents' abilities to manipulate. We show that while it is intractable to compute the best response of an agent in the general case, there exist oracle-efficient algorithms that can solve the best response of the agents when the learner's hypothesis class is the class of linear classifiers, or when the agents' cost function satisfies a natural notion of submodularity as we define. We then turn our attention to the learner's optimization problem and provide both positive and negative results on the algorithmic problem of how much information the learner should release about the classifier to maximize their expected accuracy.
Paper Structure (21 sections, 18 theorems, 37 equations, 1 figure, 1 table, 3 algorithms)

This paper contains 21 sections, 18 theorems, 37 equations, 1 figure, 1 table, 3 algorithms.

Key Result

Theorem 3.1

$\Omega(2^n / \sqrt{n})$ calls to the oracle (Algorithm alg:oracle) are required to compute the best response of an agent with a $2/3$ probability of success, even when $\mathcal{X} = \mathbb{R}^2$ and the cost function is $c_p$ for some $p\ge 1$.

Figures (1)

  • Figure 1: In this example, we consider $p=2$, i.e., $c(x,x') = \|x, x'\|_2$. The agent is located at the origin. Blue nodes correspond to a point in the intersection of the positive regions of subsets of classifiers of size $\frac{n}{2} - 1$, each located at a Euclidean distance of $1/2 - \epsilon$ from the origin, where $\epsilon$ is a small positive value. Moreover, points in the intersection of the positive regions of subsets classifiers of size $\frac{n}{2}$ are indicated by red points, all except the one corresponding to $S^\star$ are located at a Euclidean distance of $1/2 + \epsilon$ from the origin. The red point corresponding to $S^\star$ is uniquely placed at a distance of $1/2 - \epsilon$ from the origin, similar to the blue nodes. Furthermore, all points, corresponding to different subsets, are located at distinct locations in the space.

Theorems & Definitions (47)

  • Example 2.1: Examples of Information Release via Subsets
  • Definition 2.2: Strategic Game with Partial Information Release
  • Example 2.3: Partial vs. Full Information Release
  • Example 2.4: Partial vs. Full Information Release
  • Theorem 3.1: Computational Hardness with Oracle Access
  • proof : Proof of Theorem \ref{['thm:bestresponsehardness']}
  • Remark 3.2
  • Theorem 3.3
  • proof : Proof of Theorem \ref{['thm:linear']}
  • Definition 3.4: $V$-Submodularity
  • ...and 37 more