Regret Analysis of Repeated Delegated Choice

MohammadTaghi Hajiaghayi; Mohammad Mahdavi; Keivan Rezaei; Suho Shin

Regret Analysis of Repeated Delegated Choice

MohammadTaghi Hajiaghayi, Mohammad Mahdavi, Keivan Rezaei, Suho Shin

TL;DR

The paper introduces a repeated delegated choice problem as an online-learning extension of delegated decision-making, where a principal learns which eligible solution sets to announce to an agent with exogenous solutions in order to minimize regret against the hindsight-optimal set. It develops a revelation-principle reduction to a single-proposal mechanism and derives sublinear regret bounds across regimes: deterministic myopic ($Reg(T)=O( ext{min}(K, ext{log log }T))$), stochastic valuations ($Reg(T)=O( ext{sqrt}(T ext{log }T))$), and gamma-discounted strategic-agent scenarios with uniformly bounded or Lipschitz/dense utilities, yielding bounds such as $O(KT_ ext{gamma} ext{log}(T_ ext{gamma}/y_{ ext{min}}))$, $O(T_ ext{gamma} ext{log}(T_ ext{gamma}/y_{ ext{min}})+ ext{log }T)$, and $O(T_ ext{gamma} ext{log}(T_ ext{gamma}/ ext{alpha})+ ext{log}(1/d)+dT)$. In the stochastic setting with strategic agents, the approach reduces to stochastic bandits with discretization, achieving $O( ext{sqrt}(T ext{log }T))$ regret plus gamma-related terms. Together, these results delineate when a principal can efficiently learn to delegate under various agent behaviors and utility structures, with implications for online labor platforms and similar delegated-search contexts.

Abstract

We present a study on a repeated delegated choice problem, which is the first to consider an online learning variant of Kleinberg and Kleinberg, EC'18. In this model, a principal interacts repeatedly with an agent who possesses an exogenous set of solutions to search for efficient ones. Each solution can yield varying utility for both the principal and the agent, and the agent may propose a solution to maximize its own utility in a selfish manner. To mitigate this behavior, the principal announces an eligible set which screens out a certain set of solutions. The principal, however, does not have any information on the distribution of solutions in advance. Therefore, the principal dynamically announces various eligible sets to efficiently learn the distribution. The principal's objective is to minimize cumulative regret compared to the optimal eligible set in hindsight. We explore two dimensions of the problem setup, whether the agent behaves myopically or strategizes across the rounds, and whether the solutions yield deterministic or stochastic utility. Our analysis mainly characterizes some regimes under which the principal can recover the sublinear regret, thereby shedding light on the rise and fall of the repeated delegation procedure in various regimes.

Regret Analysis of Repeated Delegated Choice

TL;DR

), stochastic valuations (

), and gamma-discounted strategic-agent scenarios with uniformly bounded or Lipschitz/dense utilities, yielding bounds such as

, and

. In the stochastic setting with strategic agents, the approach reduces to stochastic bandits with discretization, achieving

regret plus gamma-related terms. Together, these results delineate when a principal can efficiently learn to delegate under various agent behaviors and utility structures, with implications for online labor platforms and similar delegated-search contexts.

Abstract

Paper Structure (28 sections, 19 theorems, 28 equations, 1 table, 6 algorithms)

This paper contains 28 sections, 19 theorems, 28 equations, 1 table, 6 algorithms.

Introduction
Our contributions
Related Works
Delegation
Repeated delegation
Stackelberg games
Problem Setup
History, mechanism, and agent's policy
Mechanism description
Single-proposal mechanism
Approximately best response and Stackelberg regret
Deterministic Setting
Strategic Agent
Uniformly bounded agent utility
Lipschitz utility with dense solutions
...and 13 more sections

Key Result

Theorem 2.2

Given any mechanism $M$ and any agent's policy $P$, there exists a single-proposal mechanism $M'$ and corresponding deterministic agent's policy $P'$ such that $\Phi_{M,P} \le \Phi_{M',P'}$ and $\Psi_{M,P} \le \Psi_{M',P'}$.

Theorems & Definitions (42)

Definition 2.1: Single-proposal mechanism
Theorem 2.2
Definition 2.3: $\varepsilon$-best response
Definition 2.4: Stackelberg regret
Theorem 3.1
Lemma 3.2: haghtalab2022learning
Theorem 3.3
Theorem 3.4
Theorem 3.5
Theorem 3.8
...and 32 more

Regret Analysis of Repeated Delegated Choice

TL;DR

Abstract

Regret Analysis of Repeated Delegated Choice

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (42)