Table of Contents
Fetching ...

A Stronger Benchmark for Online Bilateral Trade: From Fixed Prices to Distributions

Anna Lunghi, Mattia Piccinato, Matteo Castiglioni, Alberto Marchesi

TL;DR

This work studies online bilateral trade under a Global Budget Balance (GBB) constraint with one-bit feedback in a stochastic environment. It proves that, under a bounded-density joint valuation distribution, one can achieve sublinear regret $\tilde{O}(T^{3/4})$ against the best GBB-fixed distribution over price pairs, matching the known lower bound and closing the gap with the weaker fixed-price benchmark. The authors introduce a three-phase algorithm: profit collection, a two-dimensional grid-based pure exploration that reuses samples efficiently, and an optimistic constrained-bandit optimization for GFT. A key technical contribution is showing that bounded density enables grid discretization without incurring prohibitive loss, and that exploration over $K^2$ price pairs can be accomplished with only $2K$ effectively coupled estimations. This advances the understanding of learnability under global budget constraints and demonstrates no separation between learning a one-dimensional WBB price and the two-dimensional GBB distribution, with potential implications for mechanism design in repeated trade settings.

Abstract

We study online bilateral trade, where a learner facilitates repeated exchanges between a buyer and a seller to maximize the Gain From Trade (GFT), i.e., the social welfare. In doing so, the learner must guarantee not to subsidize the market. This constraint is usually imposed per round through Weak Budget Balance (WBB). Despite that, Bernasconi et al. [2024] show that a Global Budget Balance (GBB) constraint on the profit -- enforced over the entire time horizon -- can improve the GFT by a multiplicative factor of two. While this might appear to be a marginal relaxation, this implies that all existing WBB-focused algorithms suffer linear regret when measured against the GBB optimum. In this work, we provide the first algorithm to achieve sublinear regret against the GBB benchmark in stochastic environments under one-bit feedback. In particular, we show that when the joint distribution of valuations has a bounded density, our algorithm achieves $\widetilde{\mathcal{O}}(T^{3/4})$ regret. Our result shows that there is no separation between the one-dimensional problem of learning the optimal WBB price and the two-dimensional problem of learning the optimal GBB distribution over pairs of prices.

A Stronger Benchmark for Online Bilateral Trade: From Fixed Prices to Distributions

TL;DR

This work studies online bilateral trade under a Global Budget Balance (GBB) constraint with one-bit feedback in a stochastic environment. It proves that, under a bounded-density joint valuation distribution, one can achieve sublinear regret against the best GBB-fixed distribution over price pairs, matching the known lower bound and closing the gap with the weaker fixed-price benchmark. The authors introduce a three-phase algorithm: profit collection, a two-dimensional grid-based pure exploration that reuses samples efficiently, and an optimistic constrained-bandit optimization for GFT. A key technical contribution is showing that bounded density enables grid discretization without incurring prohibitive loss, and that exploration over price pairs can be accomplished with only effectively coupled estimations. This advances the understanding of learnability under global budget constraints and demonstrates no separation between learning a one-dimensional WBB price and the two-dimensional GBB distribution, with potential implications for mechanism design in repeated trade settings.

Abstract

We study online bilateral trade, where a learner facilitates repeated exchanges between a buyer and a seller to maximize the Gain From Trade (GFT), i.e., the social welfare. In doing so, the learner must guarantee not to subsidize the market. This constraint is usually imposed per round through Weak Budget Balance (WBB). Despite that, Bernasconi et al. [2024] show that a Global Budget Balance (GBB) constraint on the profit -- enforced over the entire time horizon -- can improve the GFT by a multiplicative factor of two. While this might appear to be a marginal relaxation, this implies that all existing WBB-focused algorithms suffer linear regret when measured against the GBB optimum. In this work, we provide the first algorithm to achieve sublinear regret against the GBB benchmark in stochastic environments under one-bit feedback. In particular, we show that when the joint distribution of valuations has a bounded density, our algorithm achieves regret. Our result shows that there is no separation between the one-dimensional problem of learning the optimal WBB price and the two-dimensional problem of learning the optimal GBB distribution over pairs of prices.
Paper Structure (27 sections, 17 theorems, 85 equations, 1 figure, 3 algorithms)

This paper contains 27 sections, 17 theorems, 85 equations, 1 figure, 3 algorithms.

Key Result

Theorem 3.1

For any $T\in \mathbb{N}$ and any learning algorithm, there is an instance such that $\mathbb{E}[R_T] \ge \Omega(T).$

Figures (1)

  • Figure 1: Top: Support of the probability distribution $\mathcal{D}_\epsilon$. Bottom: Expected profit (left) and GFT (right) under $\mathcal{D}_\epsilon$.

Theorems & Definitions (27)

  • Theorem 3.1
  • Theorem 4.1
  • Lemma 5.0
  • Lemma 5.0
  • proof : Proof Sketch
  • Lemma 7.0
  • proof : Proof Sketch
  • Lemma 7.1
  • Lemma 8.0
  • Lemma 8.0
  • ...and 17 more