Table of Contents
Fetching ...

A Tight Regret Analysis of Non-Parametric Repeated Contextual Brokerage

François Bachoc, Tommaso Cesari, Roberto Colomboni

TL;DR

This work advances online learning for contextual brokerage in a non-parametric setting by relaxing parametric assumptions and analyzing both full- and limited-feedback regimes. It introduces BiAve and ExBis to achieve tight regret bounds: $R_T = O(T^{d/(d+2)})$ in the full-feedback case and $R_T = O(T^{(d+2)/(d+4)})$ in the limited-feedback case, with matching lower bounds establishing optimality. A central result is a tight $1/2$-approximation to the first-best benchmark, showing that an oracle with knowledge of traders' distributions attains at least half of the omniscient benchmark's gain from trade, and this factor is unimprovable. The methodology hinges on adaptive dyadic partitioning of the context space, careful bias-variance control of empirical averages, and specialized exploration strategies to contend with noisy or incomplete feedback. Collectively, these results provide theoretical guarantees for non-parametric, context-aware brokerage in OTC-like markets and highlight avenues for extending the framework to broader market dynamics and feedback models.

Abstract

We study a contextual version of the repeated brokerage problem. In each interaction, two traders with private valuations for an item seek to buy or sell based on the learner's-a broker-proposed price, which is informed by some contextual information. The broker's goal is to maximize the traders' net utility-also known as the gain from trade-by minimizing regret compared to an oracle with perfect knowledge of traders' valuation distributions. We assume that traders' valuations are zero-mean perturbations of the unknown item's current market value-which can change arbitrarily from one interaction to the next-and that similar contexts will correspond to similar market prices. We analyze two feedback settings: full-feedback, where after each interaction the traders' valuations are revealed to the broker, and limited-feedback, where only transaction attempts are revealed. For both feedback types, we propose algorithms achieving tight regret bounds. We further strengthen our performance guarantees by providing a tight 1/2-approximation result showing that the oracle that knows the traders' valuation distributions achieves at least 1/2 of the gain from trade of the omniscient oracle that knows in advance the actual realized traders' valuations.

A Tight Regret Analysis of Non-Parametric Repeated Contextual Brokerage

TL;DR

This work advances online learning for contextual brokerage in a non-parametric setting by relaxing parametric assumptions and analyzing both full- and limited-feedback regimes. It introduces BiAve and ExBis to achieve tight regret bounds: in the full-feedback case and in the limited-feedback case, with matching lower bounds establishing optimality. A central result is a tight -approximation to the first-best benchmark, showing that an oracle with knowledge of traders' distributions attains at least half of the omniscient benchmark's gain from trade, and this factor is unimprovable. The methodology hinges on adaptive dyadic partitioning of the context space, careful bias-variance control of empirical averages, and specialized exploration strategies to contend with noisy or incomplete feedback. Collectively, these results provide theoretical guarantees for non-parametric, context-aware brokerage in OTC-like markets and highlight avenues for extending the framework to broader market dynamics and feedback models.

Abstract

We study a contextual version of the repeated brokerage problem. In each interaction, two traders with private valuations for an item seek to buy or sell based on the learner's-a broker-proposed price, which is informed by some contextual information. The broker's goal is to maximize the traders' net utility-also known as the gain from trade-by minimizing regret compared to an oracle with perfect knowledge of traders' valuation distributions. We assume that traders' valuations are zero-mean perturbations of the unknown item's current market value-which can change arbitrarily from one interaction to the next-and that similar contexts will correspond to similar market prices. We analyze two feedback settings: full-feedback, where after each interaction the traders' valuations are revealed to the broker, and limited-feedback, where only transaction attempts are revealed. For both feedback types, we propose algorithms achieving tight regret bounds. We further strengthen our performance guarantees by providing a tight 1/2-approximation result showing that the oracle that knows the traders' valuation distributions achieves at least 1/2 of the gain from trade of the omniscient oracle that knows in advance the actual realized traders' valuations.

Paper Structure

This paper contains 20 sections, 7 theorems, 71 equations, 3 algorithms.

Key Result

Theorem 1

In the full-feedback setting, if we run the BiAve algorithm for $T$ time steps, its regret satisfies

Theorems & Definitions (14)

  • Theorem 1
  • proof : Proof sketch
  • Theorem 2
  • proof : Proof sketch
  • Theorem 3
  • proof : Proof sketch
  • Theorem 4
  • proof : Proof sketch
  • Theorem 5
  • proof
  • ...and 4 more