Feature-Based Online Bilateral Trade
Solenne Gaucher, Martino Bernasconi, Matteo Castiglioni, Andrea Celli, Vianney Perchet
TL;DR
This work introduces the feature-based online bilateral trade model, where a learner prices a seller and buyer based on item features. It provides a multi-regime analysis: a noiseless two-bit, strong-budget setting achieving $O( abla \log T)$ regret; a noisy two-bit setting attaining $\tilde O(T^{3/4})$ via Explore-Or-Commit and $\tilde O(T^{2/3})$ with Scouting Bandits and information pooling; and a reduction to one-bit feedback under global budget balance that preserves sublinear regret. The results reveal a fundamental trade-off between feedback richness and budget constraints, and they establish near-optimal context-dependent rates in this online pricing game. The methodology combines ellipsoidal uncertainty for deterministic valuations, decompositions of EGFT, and scalable learning subroutines to handle partial feedback. The work thus advances understanding of contextual pricing in online bilateral trade with limited feedback and dynamic budgets, with implications for practical market design under privacy and information constraints.
Abstract
Bilateral trade models the problem of facilitating trades between a seller and a buyer having private valuations for the item being sold. In the online version of the problem, the learner faces a new seller and buyer at each time step, and has to post a price for each of the two parties without any knowledge of their valuations. We consider a scenario where, at each time step, before posting prices the learner observes a context vector containing information about the features of the item for sale. The valuations of both the seller and the buyer follow an unknown linear function of the context. In this setting, the learner could leverage previous transactions in an attempt to estimate private valuations. We characterize the regret regimes of different settings, taking as a baseline the best context-dependent prices in hindsight. First, in the setting in which the learner has two-bit feedback and strong budget balance constraints, we propose an algorithm with $O(\log T)$ regret. Then, we study the same set-up with noisy valuations, providing a tight $\widetilde O(T^{\frac23})$ regret upper bound. Finally, we show that loosening budget balance constraints allows the learner to operate under more restrictive feedback. Specifically, we show how to address the one-bit, global budget balance setting through a reduction from the two-bit, strong budget balance setup. This established a fundamental trade-off between the quality of the feedback and the strictness of the budget constraints.
