Regret Minimization for Piecewise Linear Rewards: Contracts, Auctions, and Beyond

Francesco Bacchiocchi; Matteo Castiglioni; Alberto Marchesi; Nicola Gatti

Regret Minimization for Piecewise Linear Rewards: Contracts, Auctions, and Beyond

Francesco Bacchiocchi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

TL;DR

This work introduces Bandit with Monotone Jumps (BwMJ), a unified online-learning framework for unknown, stochastic piecewise linear rewards arising in microeconomic settings. It develops the RJI-OS algorithm, an epoch-based method that first identifies large jumps, then shrinks the action space while maintaining near-optimal actions, achieving a regret of $\tilde{O}(\sqrt{nT})$ which is tight when the number of pieces $n$ is small relative to $T$ (i.e., $n \le T^{1/3}$). The approach resolves open questions in learning linear contracts for hidden-action principal-agent problems and in dynamic pricing with finitely many valuations by delivering instance-independent regret bounds. The results connect to and extend prior work in online contract design, dynamic pricing, and continuum-armed bandits, providing both worst-case and instance-dependent guarantees that improve upon existing bounds in relevant microeconomic models. The practical impact lies in enabling efficient, low-regret learning for revenue-maximizing contracts and pricing under uncertainty, with theoretical guarantees that scale with the problem’s structural complexity rather than jump magnitudes.

Abstract

Most microeconomic models of interest involve optimizing a piecewise linear function. These include contract design in hidden-action principal-agent problems, selling an item in posted-price auctions, and bidding in first-price auctions. When the relevant model parameters are unknown and determined by some (unknown) probability distributions, the problem becomes learning how to optimize an unknown and stochastic piecewise linear reward function. Such a problem is usually framed within an online learning framework, where the decision-maker (learner) seeks to minimize the regret of not knowing an optimal decision in hindsight. This paper introduces a general online learning framework that offers a unified approach to tackle regret minimization for piecewise linear rewards, under a suitable monotonicity assumption commonly satisfied by microeconomic models. We design a learning algorithm that attains a regret of $\widetilde{O}(\sqrt{nT})$, where $n$ is the number of ``pieces'' of the reward function and $T$ is the number of rounds. This result is tight when $n$ is \emph{small} relative to $T$, specifically when $n \leq T^{1/3}$. Our algorithm solves two open problems in the literature on learning in microeconomic settings. First, it shows that the $\widetilde{O}(T^{2/3})$ regret bound obtained by Zhu et al. [Zhu+23] for learning optimal linear contracts in hidden-action principal-agent problems is not tight when the number of agent's actions is small relative to $T$. Second, our algorithm demonstrates that, in the problem of learning to set prices in posted-price auctions, it is possible to attain suitable (and desirable) instance-independent regret bounds, addressing an open problem posed by Cesa-Bianchi et al. [CBCP19].

Regret Minimization for Piecewise Linear Rewards: Contracts, Auctions, and Beyond

TL;DR

which is tight when the number of pieces

is small relative to

(i.e.,

). The approach resolves open questions in learning linear contracts for hidden-action principal-agent problems and in dynamic pricing with finitely many valuations by delivering instance-independent regret bounds. The results connect to and extend prior work in online contract design, dynamic pricing, and continuum-armed bandits, providing both worst-case and instance-dependent guarantees that improve upon existing bounds in relevant microeconomic models. The practical impact lies in enabling efficient, low-regret learning for revenue-maximizing contracts and pricing under uncertainty, with theoretical guarantees that scale with the problem’s structural complexity rather than jump magnitudes.

Abstract

, where

is the number of ``pieces'' of the reward function and

is the number of rounds. This result is tight when

is \emph{small} relative to

, specifically when

. Our algorithm solves two open problems in the literature on learning in microeconomic settings. First, it shows that the

regret bound obtained by Zhu et al. [Zhu+23] for learning optimal linear contracts in hidden-action principal-agent problems is not tight when the number of agent's actions is small relative to

. Second, our algorithm demonstrates that, in the problem of learning to set prices in posted-price auctions, it is possible to attain suitable (and desirable) instance-independent regret bounds, addressing an open problem posed by Cesa-Bianchi et al. [CBCP19].

Regret Minimization for Piecewise Linear Rewards: Contracts, Auctions, and Beyond

TL;DR

Abstract

Regret Minimization for Piecewise Linear Rewards: Contracts, Auctions, and Beyond

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (35)