Table of Contents
Fetching ...

Attack-Resistant Uniform Fairness for Linear and Smooth Contextual Bandits

Qingwen Zhang, Wenjia Wang

TL;DR

The paper tackles item-level fairness in contextual bandits by introducing a strict $(1- abla)$-uniform fairness constraint that must hold uniformly across contexts and rounds. It develops attack-aware algorithms for both linear and Hölder-smooth reward models that achieve near-minimax regret while preserving high-probability fairness, and proves a fundamental price of fairness via a fairness-induced confusion zone. The work further reveals a vulnerability: small corruption budgets can induce persistent unfairness, and it proposes corruption-adaptive exploration and error-compensated thresholding to yield robust, fair algorithms with minimax-optimal regret under $C$-budgeted attacks, including a multiplicative $C$–$T$ coupling in the nonparametric regime. Numerical experiments and a wine-brokerage case validate that the robust fair methods maintain fairness and efficiency under benign and adversarial conditions, highlighting practical applicability for sustaining equitable and efficient platforms. Overall, the study advances reliable, fair online decision-making in settings where feedback signals may be manipulated, providing both theoretical guarantees and empirical demonstrations of attack-resilience.

Abstract

Modern systems, such as digital platforms and service systems, increasingly rely on contextual bandits for online decision-making; however, their deployment can inadvertently create unfair exposure among arms, undermining long-term platform sustainability and supplier trust. This paper studies the contextual bandit problem under a uniform $(1-δ)$-fairness constraint, and addresses its unique vulnerabilities to strategic manipulation. The fairness constraint ensures that preferential treatment is strictly justified by an arm's actual reward across all contexts and time horizons, using uniformity to prevent statistical loopholes. We develop novel algorithms that achieve (nearly) minimax-optimal regret for both linear and smooth reward functions, while maintaining strong $(1-\tilde{O}(1/T))$-fairness guarantees, and further characterize the theoretically inherent yet asymptotically marginal "price of fairness". However, we reveal that such merit-based fairness becomes uniquely susceptible to signal manipulation. We show that an adversary with a minimal $\tilde{O}(1)$ budget can not only degrade overall performance as in traditional attacks, but also selectively induce insidious fairness-specific failures while leaving conspicuous regret measures largely unaffected. To counter this, we design robust variants incorporating corruption-adaptive exploration and error-compensated thresholding. Our approach yields the first minimax-optimal regret bounds under $C$-budgeted attack while preserving $(1-\tilde{O}(1/T))$-fairness. Numerical experiments and a real-world case demonstrate that our algorithms sustain both fairness and efficiency.

Attack-Resistant Uniform Fairness for Linear and Smooth Contextual Bandits

TL;DR

The paper tackles item-level fairness in contextual bandits by introducing a strict -uniform fairness constraint that must hold uniformly across contexts and rounds. It develops attack-aware algorithms for both linear and Hölder-smooth reward models that achieve near-minimax regret while preserving high-probability fairness, and proves a fundamental price of fairness via a fairness-induced confusion zone. The work further reveals a vulnerability: small corruption budgets can induce persistent unfairness, and it proposes corruption-adaptive exploration and error-compensated thresholding to yield robust, fair algorithms with minimax-optimal regret under -budgeted attacks, including a multiplicative coupling in the nonparametric regime. Numerical experiments and a wine-brokerage case validate that the robust fair methods maintain fairness and efficiency under benign and adversarial conditions, highlighting practical applicability for sustaining equitable and efficient platforms. Overall, the study advances reliable, fair online decision-making in settings where feedback signals may be manipulated, providing both theoretical guarantees and empirical demonstrations of attack-resilience.

Abstract

Modern systems, such as digital platforms and service systems, increasingly rely on contextual bandits for online decision-making; however, their deployment can inadvertently create unfair exposure among arms, undermining long-term platform sustainability and supplier trust. This paper studies the contextual bandit problem under a uniform -fairness constraint, and addresses its unique vulnerabilities to strategic manipulation. The fairness constraint ensures that preferential treatment is strictly justified by an arm's actual reward across all contexts and time horizons, using uniformity to prevent statistical loopholes. We develop novel algorithms that achieve (nearly) minimax-optimal regret for both linear and smooth reward functions, while maintaining strong -fairness guarantees, and further characterize the theoretically inherent yet asymptotically marginal "price of fairness". However, we reveal that such merit-based fairness becomes uniquely susceptible to signal manipulation. We show that an adversary with a minimal budget can not only degrade overall performance as in traditional attacks, but also selectively induce insidious fairness-specific failures while leaving conspicuous regret measures largely unaffected. To counter this, we design robust variants incorporating corruption-adaptive exploration and error-compensated thresholding. Our approach yields the first minimax-optimal regret bounds under -budgeted attack while preserving -fairness. Numerical experiments and a real-world case demonstrate that our algorithms sustain both fairness and efficiency.
Paper Structure (81 sections, 36 theorems, 274 equations, 5 figures, 1 table, 4 algorithms)

This paper contains 81 sections, 36 theorems, 274 equations, 5 figures, 1 table, 4 algorithms.

Key Result

Theorem 3.1

Suppose Assumptions assum_parabound-assum_excite hold and $T>2d+2\sqrt{2K}$. When $C_a>\frac{20K^2}{\widetilde{p}D_2}\vee \frac{8K^2}{\widetilde{p}^2}\vee \frac{640K}{h^2D_1}$ with $D_1=\frac{\lambda^{*2}\widetilde{p}^2}{32d^2r^4\sigma^2K^2}$, $D_2=\min \left(\frac{1}{2}, \frac{\lambda^{*}}{8 r^2}\r

Figures (5)

  • Figure 1: Two-Arm Illustration: Flatness at Decision Boundary Adapts to $\beta$.
  • Figure 2: Performance comparison of linear and smooth contextual bandit algorithms in the stochastic setting. Left: Cumulative regret over time. Right: Cumulative unfair decisions over time.
  • Figure 3: Performance comparison of linear and smooth contextual bandit algorithms in the adversarial setting. Left: Cumulative regret over time. Right: Cumulative unfair decisions over time.
  • Figure 4: Performance comparison of linear and smooth contextual bandit algorithms on the wine brokerage platform. Left: Cumulative regret over time. Right: Cumulative unfair decisions over time. Note: An unfair decision occurs when a candidate agent yields at least 0.01 lower observed profit than other agents to compensate for observation randomness. Lines show mean values from 10 independent runs, with shaded areas representing 95% confidence intervals.
  • Figure 5: Performance comparison of linear and smooth contextual bandit algorithms on the wine brokerage platform under attack. Left: Cumulative regret over time. Right: Cumulative unfair decisions over time. Note: An unfair decision occurs when a candidate agent yields at least 0.01 lower observed profit than other agents to compensate for observation randomness. Lines show mean values from 10 independent runs, with shaded areas representing 95% confidence intervals.

Theorems & Definitions (40)

  • Definition 1: $(1-\delta)$-fairness
  • Definition 2: $\epsilon$-chaining
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Remark 1
  • Proposition 3.4
  • Proposition 3.5: Easily Verifiable Sufficient Conditions
  • Proposition 3.6
  • Proposition 3.7
  • ...and 30 more