Table of Contents
Fetching ...

Optimal Decision Making Under Strategic Behavior

Stratis Tsirtsis, Behzad Tabibian, Moein Khajehnejad, Adish Singla, Bernhard Schölkopf, Manuel Gomez-Rodriguez

TL;DR

This work addresses optimal decision-making when recipients can strategically alter their features in response to a policy. It casts the problem as a Stackelberg game where the decision-maker commits to a policy $\pi$ that induces a transported feature distribution $P(\mathbf{x}|\pi)$ via individuals' best responses, formalized through an optimal-transport framework. The authors prove the general problem is NP-hard and show that deterministic policies may be suboptimal in strategic settings, but provide two tractable approaches: a polynomial-time dynamic-programming heuristic under outcome-monotonic costs and a general-cost iterative method that finds locally optimal policies in polynomial time. Experiments on synthetic and real credit-card data demonstrate that policies accounting for strategic behavior achieve higher utility than those that do not, highlighting practical implications for lending, hiring, and insurance. The work offers a principled, incentive-aware foundation for designing transparent decision policies that remain effective when individuals adapt strategically.

Abstract

We are witnessing an increasing use of data-driven predictive models to inform decisions. As decisions have implications for individuals and society, there is increasing pressure on decision makers to be transparent about their decision policies. At the same time, individuals may use knowledge, gained by transparency, to invest effort strategically in order to maximize their chances of receiving a beneficial decision. Our goal is to find decision policies that are optimal in terms of utility in such a strategic setting. To this end, we first characterize how strategic investment of effort by individuals leads to a change in the feature distribution. Using this characterization, we first show that, in general, we cannot expect to find optimal decision policies in polynomial time and there are cases in which deterministic policies are suboptimal. Then, we demonstrate that, if the cost individuals pay to change their features satisfies a natural monotonicity assumption, we can narrow down the search for the optimal policy to a particular family of decision policies with a set of desirable properties, which allow for a highly effective polynomial time heuristic search algorithm using dynamic programming. Finally, under no assumptions on the cost individuals pay to change their features, we develop an iterative search algorithm that is guaranteed to find locally optimal decision policies also in polynomial time. Experiments on synthetic and real credit card data illustrate our theoretical findings and show that the decision policies found by our algorithms achieve higher utility than those that do not account for strategic behavior.

Optimal Decision Making Under Strategic Behavior

TL;DR

This work addresses optimal decision-making when recipients can strategically alter their features in response to a policy. It casts the problem as a Stackelberg game where the decision-maker commits to a policy that induces a transported feature distribution via individuals' best responses, formalized through an optimal-transport framework. The authors prove the general problem is NP-hard and show that deterministic policies may be suboptimal in strategic settings, but provide two tractable approaches: a polynomial-time dynamic-programming heuristic under outcome-monotonic costs and a general-cost iterative method that finds locally optimal policies in polynomial time. Experiments on synthetic and real credit-card data demonstrate that policies accounting for strategic behavior achieve higher utility than those that do not, highlighting practical implications for lending, hiring, and insurance. The work offers a principled, incentive-aware foundation for designing transparent decision policies that remain effective when individuals adapt strategically.

Abstract

We are witnessing an increasing use of data-driven predictive models to inform decisions. As decisions have implications for individuals and society, there is increasing pressure on decision makers to be transparent about their decision policies. At the same time, individuals may use knowledge, gained by transparency, to invest effort strategically in order to maximize their chances of receiving a beneficial decision. Our goal is to find decision policies that are optimal in terms of utility in such a strategic setting. To this end, we first characterize how strategic investment of effort by individuals leads to a change in the feature distribution. Using this characterization, we first show that, in general, we cannot expect to find optimal decision policies in polynomial time and there are cases in which deterministic policies are suboptimal. Then, we demonstrate that, if the cost individuals pay to change their features satisfies a natural monotonicity assumption, we can narrow down the search for the optimal policy to a particular family of decision policies with a set of desirable properties, which allow for a highly effective polynomial time heuristic search algorithm using dynamic programming. Finally, under no assumptions on the cost individuals pay to change their features, we develop an iterative search algorithm that is guaranteed to find locally optimal decision policies also in polynomial time. Experiments on synthetic and real credit card data illustrate our theoretical findings and show that the decision policies found by our algorithms achieve higher utility than those that do not account for strategic behavior.

Paper Structure

This paper contains 17 sections, 6 theorems, 21 equations, 6 figures, 1 table, 2 algorithms.

Key Result

Theorem 1

The problem of finding the optimal decision policy $\pi^{*}$ that maximizes utility in a strategic setting is NP-hard.

Figures (6)

  • Figure 1: Optimal decision policies and induced feature distributions. Panels (a) and (b) visualize $P(\bm{x})$ and $P(y = 1 {\,|\,} \bm{x})$, respectively. In Panels (c-h), we set the cost to change feature values $c(\bm{x}_i, \bm{x}_j) = \alpha[|x_{i0}-x_{j0}|+|x_{i1}-x_{j1}|]$, where $\alpha$ is a given parameter and $\gamma = 0.2$. In all panels, each cell corresponds to a different feature value $\bm{x}_i$ and darker colors correspond to higher values. As the cost of moving to further feature values for individuals decreases, the decision policy only provides positive decisions for a few $\bm{x}$ values with high $(P(y = 1 {\,|\,} \bm{x})-\gamma)$, encouraging individuals to move to those values.
  • Figure 2: Optimal policy and subpolicies after Algorithm \ref{['alg:dp']} performs its first round. Panel (a) shows the optimal subpolicy $\pi^{*}(\bm{x})$, which contains blocking states in $\bm{x}_3$ and $\bm{x}_5$. Panel (b) shows the subpolicy $\pi_{5,3}(\bm{x})$, which is a base subpolicy that can be computed without recursion. Panel (c) shows the subpolicy $\pi_{4,2}(\bm{x})$, which contains a blocking state in $\bm{x}_4$ and uses a lowered version of the subpolicy $\pi_{5,4}(\bm{x})$ to set the feature value $\bm{x}_5$. Since $\pi_{4,2}(\bm{x}_4)-c(\bm{x}_5,\bm{x}_4)<0$, this value is set equal to $\pi_{4,2}(\bm{x}_4)$. Panel (d) shows the subpolicy $\pi_{2,1}(\bm{x})$, which contains a blocking state in $\bm{x}_3$ and uses a lowered version of the subpolicy $\pi_{5,3}(\bm{x})$ to set the feature values $\bm{x}_4$ and $\bm{x}_5$. Since in $\pi_{2,1}(\bm{x})$, the feature value $\bm{x}_5$ became negative and was set as blocking, the algorithm will perform a second round, starting from $\bm{x}_3$, which is the last blocking state before $d=4$.
  • Figure 3: Performance evaluation on synthetic data. Panels show the utility obtained by several decision policies against the number of feature values $m$. Here, note that the dynamic programming (DP) algorithm (Algorithm \ref{['alg:dp']}) only works with outcome monotonic additive costs and thus only appears in Panel (a). In Panel (a), we set $\kappa=0.1$ and, in Panel (b), we set $\kappa = 0.75$.
  • Figure 4: Running time analysis on synthetic data with outcome monotonic and additive costs. Panel (a) shows the running time of the brute force search, the threshold policy baseline, our iterative algorithm and our dynamic programming algorithm. Panels (b) and (c) show the number of iterations and rounds required by the iterative and dynamic programming algorithms until termination, respectively, for different $\kappa$ values. In Panel (a), we set $\kappa=0.1$.
  • Figure 5: Transportation of mass induced by the policies found via the iterative algorithm (Algorithm \ref{['alg:iterative']}) in the credit dataset for several values of $\alpha$, which controls the difficulty of changing features. For each individual in the population, we record her outcome $P(y=1{\,|\,}\bm{x})$ before the best-response (Initial $P(y=1{\,|\,}\bm{x})$) and after the best response (Final $P(y=1{\,|\,}\bm{x})$). In each panel, the color illustrates the percentage of individuals with the corresponding initial and final $P(y=1{\,|\,}\bm{x})$ values.
  • ...and 1 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Proposition 2
  • Theorem 3
  • Proposition 4
  • Proposition 5
  • Proposition 6