Table of Contents
Fetching ...

Regret Minimization for Piecewise Linear Rewards: Contracts, Auctions, and Beyond

Francesco Bacchiocchi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti

TL;DR

This work introduces Bandit with Monotone Jumps (BwMJ), a unified online-learning framework for unknown, stochastic piecewise linear rewards arising in microeconomic settings. It develops the RJI-OS algorithm, an epoch-based method that first identifies large jumps, then shrinks the action space while maintaining near-optimal actions, achieving a regret of $\tilde{O}(\sqrt{nT})$ which is tight when the number of pieces $n$ is small relative to $T$ (i.e., $n \le T^{1/3}$). The approach resolves open questions in learning linear contracts for hidden-action principal-agent problems and in dynamic pricing with finitely many valuations by delivering instance-independent regret bounds. The results connect to and extend prior work in online contract design, dynamic pricing, and continuum-armed bandits, providing both worst-case and instance-dependent guarantees that improve upon existing bounds in relevant microeconomic models. The practical impact lies in enabling efficient, low-regret learning for revenue-maximizing contracts and pricing under uncertainty, with theoretical guarantees that scale with the problem’s structural complexity rather than jump magnitudes.

Abstract

Most microeconomic models of interest involve optimizing a piecewise linear function. These include contract design in hidden-action principal-agent problems, selling an item in posted-price auctions, and bidding in first-price auctions. When the relevant model parameters are unknown and determined by some (unknown) probability distributions, the problem becomes learning how to optimize an unknown and stochastic piecewise linear reward function. Such a problem is usually framed within an online learning framework, where the decision-maker (learner) seeks to minimize the regret of not knowing an optimal decision in hindsight. This paper introduces a general online learning framework that offers a unified approach to tackle regret minimization for piecewise linear rewards, under a suitable monotonicity assumption commonly satisfied by microeconomic models. We design a learning algorithm that attains a regret of $\widetilde{O}(\sqrt{nT})$, where $n$ is the number of ``pieces'' of the reward function and $T$ is the number of rounds. This result is tight when $n$ is \emph{small} relative to $T$, specifically when $n \leq T^{1/3}$. Our algorithm solves two open problems in the literature on learning in microeconomic settings. First, it shows that the $\widetilde{O}(T^{2/3})$ regret bound obtained by Zhu et al. [Zhu+23] for learning optimal linear contracts in hidden-action principal-agent problems is not tight when the number of agent's actions is small relative to $T$. Second, our algorithm demonstrates that, in the problem of learning to set prices in posted-price auctions, it is possible to attain suitable (and desirable) instance-independent regret bounds, addressing an open problem posed by Cesa-Bianchi et al. [CBCP19].

Regret Minimization for Piecewise Linear Rewards: Contracts, Auctions, and Beyond

TL;DR

This work introduces Bandit with Monotone Jumps (BwMJ), a unified online-learning framework for unknown, stochastic piecewise linear rewards arising in microeconomic settings. It develops the RJI-OS algorithm, an epoch-based method that first identifies large jumps, then shrinks the action space while maintaining near-optimal actions, achieving a regret of which is tight when the number of pieces is small relative to (i.e., ). The approach resolves open questions in learning linear contracts for hidden-action principal-agent problems and in dynamic pricing with finitely many valuations by delivering instance-independent regret bounds. The results connect to and extend prior work in online contract design, dynamic pricing, and continuum-armed bandits, providing both worst-case and instance-dependent guarantees that improve upon existing bounds in relevant microeconomic models. The practical impact lies in enabling efficient, low-regret learning for revenue-maximizing contracts and pricing under uncertainty, with theoretical guarantees that scale with the problem’s structural complexity rather than jump magnitudes.

Abstract

Most microeconomic models of interest involve optimizing a piecewise linear function. These include contract design in hidden-action principal-agent problems, selling an item in posted-price auctions, and bidding in first-price auctions. When the relevant model parameters are unknown and determined by some (unknown) probability distributions, the problem becomes learning how to optimize an unknown and stochastic piecewise linear reward function. Such a problem is usually framed within an online learning framework, where the decision-maker (learner) seeks to minimize the regret of not knowing an optimal decision in hindsight. This paper introduces a general online learning framework that offers a unified approach to tackle regret minimization for piecewise linear rewards, under a suitable monotonicity assumption commonly satisfied by microeconomic models. We design a learning algorithm that attains a regret of , where is the number of ``pieces'' of the reward function and is the number of rounds. This result is tight when is \emph{small} relative to , specifically when . Our algorithm solves two open problems in the literature on learning in microeconomic settings. First, it shows that the regret bound obtained by Zhu et al. [Zhu+23] for learning optimal linear contracts in hidden-action principal-agent problems is not tight when the number of agent's actions is small relative to . Second, our algorithm demonstrates that, in the problem of learning to set prices in posted-price auctions, it is possible to attain suitable (and desirable) instance-independent regret bounds, addressing an open problem posed by Cesa-Bianchi et al. [CBCP19].

Paper Structure

This paper contains 29 sections, 23 theorems, 42 equations, 5 figures, 1 table, 5 algorithms.

Key Result

Theorem 1

There exists an algorithm that attains $\widetilde{O} ( \sqrt{n T} )$ regret in any BwMJ instance, where $n$ is the number of action intervals of the instance and $T$ is the number of rounds of the learning interaction.

Figures (5)

  • Figure 1: Example of BwMJ instance. (Left) The expected values $\mu_{h(\alpha)}$ as a function of $\alpha$, where $h(\alpha)$ is defined as the index $i$ such that $\alpha \in \mathcal{A}_i$. (Center) The linear function $\ell(\alpha) \coloneqq 1-\alpha$. (Right) The learner's expected reward $u(\alpha) \coloneqq \ell(\alpha) \mu_{h(\alpha)}$, which is a piecewise linear function over the action space $[0,1]$.
  • Figure 2: (Left) Example of principal's expected utility by using linear contracts in a principal-agent problem. (Right) Example of seller's expected utility in a posted-price auction.
  • Figure 3: Example of tree of recursive calls generated by the execution of Find-Jumps (Algorithm \ref{['alg:find_bp']}).
  • Figure 4: Principal's expected utility in the two instances used to prove the lower bound in Theorem \ref{['thm:lb']}.
  • Figure 5: Regret upper and lower bounds as functions of the parameter $n \le T^{1/3}$. For the sake of presentation, we omitted logarithmic factors in the dependence of the regret bounds suffered by the different algorithms.

Theorems & Definitions (35)

  • Theorem
  • Definition 4.1: Clean event
  • Lemma 4.1
  • Lemma 4.1
  • Lemma 4.1
  • Lemma 4.1
  • Lemma 4.1
  • Lemma 4.1
  • Theorem 4.2
  • Corollary 4.3
  • ...and 25 more