A Stronger Benchmark for Online Bilateral Trade: From Fixed Prices to Distributions
Anna Lunghi, Mattia Piccinato, Matteo Castiglioni, Alberto Marchesi
TL;DR
This work studies online bilateral trade under a Global Budget Balance (GBB) constraint with one-bit feedback in a stochastic environment. It proves that, under a bounded-density joint valuation distribution, one can achieve sublinear regret $\tilde{O}(T^{3/4})$ against the best GBB-fixed distribution over price pairs, matching the known lower bound and closing the gap with the weaker fixed-price benchmark. The authors introduce a three-phase algorithm: profit collection, a two-dimensional grid-based pure exploration that reuses samples efficiently, and an optimistic constrained-bandit optimization for GFT. A key technical contribution is showing that bounded density enables grid discretization without incurring prohibitive loss, and that exploration over $K^2$ price pairs can be accomplished with only $2K$ effectively coupled estimations. This advances the understanding of learnability under global budget constraints and demonstrates no separation between learning a one-dimensional WBB price and the two-dimensional GBB distribution, with potential implications for mechanism design in repeated trade settings.
Abstract
We study online bilateral trade, where a learner facilitates repeated exchanges between a buyer and a seller to maximize the Gain From Trade (GFT), i.e., the social welfare. In doing so, the learner must guarantee not to subsidize the market. This constraint is usually imposed per round through Weak Budget Balance (WBB). Despite that, Bernasconi et al. [2024] show that a Global Budget Balance (GBB) constraint on the profit -- enforced over the entire time horizon -- can improve the GFT by a multiplicative factor of two. While this might appear to be a marginal relaxation, this implies that all existing WBB-focused algorithms suffer linear regret when measured against the GBB optimum. In this work, we provide the first algorithm to achieve sublinear regret against the GBB benchmark in stochastic environments under one-bit feedback. In particular, we show that when the joint distribution of valuations has a bounded density, our algorithm achieves $\widetilde{\mathcal{O}}(T^{3/4})$ regret. Our result shows that there is no separation between the one-dimensional problem of learning the optimal WBB price and the two-dimensional problem of learning the optimal GBB distribution over pairs of prices.
