Regret Analysis of Repeated Delegated Choice
MohammadTaghi Hajiaghayi, Mohammad Mahdavi, Keivan Rezaei, Suho Shin
TL;DR
The paper introduces a repeated delegated choice problem as an online-learning extension of delegated decision-making, where a principal learns which eligible solution sets to announce to an agent with exogenous solutions in order to minimize regret against the hindsight-optimal set. It develops a revelation-principle reduction to a single-proposal mechanism and derives sublinear regret bounds across regimes: deterministic myopic ($Reg(T)=O( ext{min}(K, ext{log log }T))$), stochastic valuations ($Reg(T)=O( ext{sqrt}(T ext{log }T))$), and gamma-discounted strategic-agent scenarios with uniformly bounded or Lipschitz/dense utilities, yielding bounds such as $O(KT_ ext{gamma} ext{log}(T_ ext{gamma}/y_{ ext{min}}))$, $O(T_ ext{gamma} ext{log}(T_ ext{gamma}/y_{ ext{min}})+ ext{log }T)$, and $O(T_ ext{gamma} ext{log}(T_ ext{gamma}/ ext{alpha})+ ext{log}(1/d)+dT)$. In the stochastic setting with strategic agents, the approach reduces to stochastic bandits with discretization, achieving $O( ext{sqrt}(T ext{log }T))$ regret plus gamma-related terms. Together, these results delineate when a principal can efficiently learn to delegate under various agent behaviors and utility structures, with implications for online labor platforms and similar delegated-search contexts.
Abstract
We present a study on a repeated delegated choice problem, which is the first to consider an online learning variant of Kleinberg and Kleinberg, EC'18. In this model, a principal interacts repeatedly with an agent who possesses an exogenous set of solutions to search for efficient ones. Each solution can yield varying utility for both the principal and the agent, and the agent may propose a solution to maximize its own utility in a selfish manner. To mitigate this behavior, the principal announces an eligible set which screens out a certain set of solutions. The principal, however, does not have any information on the distribution of solutions in advance. Therefore, the principal dynamically announces various eligible sets to efficiently learn the distribution. The principal's objective is to minimize cumulative regret compared to the optimal eligible set in hindsight. We explore two dimensions of the problem setup, whether the agent behaves myopically or strategizes across the rounds, and whether the solutions yield deterministic or stochastic utility. Our analysis mainly characterizes some regimes under which the principal can recover the sublinear regret, thereby shedding light on the rise and fall of the repeated delegation procedure in various regimes.
