Table of Contents
Fetching ...

Trading Volume Maximization with Online Learning

Tommaso Cesari, Roberto Colomboni

TL;DR

This work reframes trading-brokerage in OTC markets as an online-learning problem focused on maximizing trading volume rather than gain-from-trade. By leveraging the Median Lemma, it develops two main algorithms—Empirical Median (FEM) for full-feedback and Median Binary Search (MBS) for 2-bit feedback—achieving near-optimal regret under regularity assumptions: $O(\ln T)$ in the full-feedback case with a continuous cdf, and $O(\ln(MT)\ln T)$ in the 2-bit case with an $M$-Lipschitz cdf. The authors provide matching lower bounds and show that removing continuity or Lipschitz assumptions degrades performance to $\Theta(\sqrt{T})$ or makes learning impossible, respectively. The results clarify how regularity shapes learning speed in two-sided online markets and open directions for non-stationary or contextual extensions with potential market-impact implications.

Abstract

We explore brokerage between traders in an online learning framework. At any round $t$, two traders meet to exchange an asset, provided the exchange is mutually beneficial. The broker proposes a trading price, and each trader tries to sell their asset or buy the asset from the other party, depending on whether the price is higher or lower than their private valuations. A trade happens if one trader is willing to sell and the other is willing to buy at the proposed price. Previous work provided guidance to a broker aiming at enhancing traders' total earnings by maximizing the gain from trade, defined as the sum of the traders' net utilities after each interaction. In contrast, we investigate how the broker should behave to maximize the trading volume, i.e., the total number of trades. We model the traders' valuations as an i.i.d. process with an unknown distribution. If the traders' valuations are revealed after each interaction (full-feedback), and the traders' valuations cumulative distribution function (cdf) is continuous, we provide an algorithm achieving logarithmic regret and show its optimality up to constant factors. If only their willingness to sell or buy at the proposed price is revealed after each interaction ($2$-bit feedback), we provide an algorithm achieving poly-logarithmic regret when the traders' valuations cdf is Lipschitz and show that this rate is near-optimal. We complement our results by analyzing the implications of dropping the regularity assumptions on the unknown traders' valuations cdf. If we drop the continuous cdf assumption, the regret rate degrades to $Θ(\sqrt{T})$ in the full-feedback case, where $T$ is the time horizon. If we drop the Lipschitz cdf assumption, learning becomes impossible in the $2$-bit feedback case.

Trading Volume Maximization with Online Learning

TL;DR

This work reframes trading-brokerage in OTC markets as an online-learning problem focused on maximizing trading volume rather than gain-from-trade. By leveraging the Median Lemma, it develops two main algorithms—Empirical Median (FEM) for full-feedback and Median Binary Search (MBS) for 2-bit feedback—achieving near-optimal regret under regularity assumptions: in the full-feedback case with a continuous cdf, and in the 2-bit case with an -Lipschitz cdf. The authors provide matching lower bounds and show that removing continuity or Lipschitz assumptions degrades performance to or makes learning impossible, respectively. The results clarify how regularity shapes learning speed in two-sided online markets and open directions for non-stationary or contextual extensions with potential market-impact implications.

Abstract

We explore brokerage between traders in an online learning framework. At any round , two traders meet to exchange an asset, provided the exchange is mutually beneficial. The broker proposes a trading price, and each trader tries to sell their asset or buy the asset from the other party, depending on whether the price is higher or lower than their private valuations. A trade happens if one trader is willing to sell and the other is willing to buy at the proposed price. Previous work provided guidance to a broker aiming at enhancing traders' total earnings by maximizing the gain from trade, defined as the sum of the traders' net utilities after each interaction. In contrast, we investigate how the broker should behave to maximize the trading volume, i.e., the total number of trades. We model the traders' valuations as an i.i.d. process with an unknown distribution. If the traders' valuations are revealed after each interaction (full-feedback), and the traders' valuations cumulative distribution function (cdf) is continuous, we provide an algorithm achieving logarithmic regret and show its optimality up to constant factors. If only their willingness to sell or buy at the proposed price is revealed after each interaction (-bit feedback), we provide an algorithm achieving poly-logarithmic regret when the traders' valuations cdf is Lipschitz and show that this rate is near-optimal. We complement our results by analyzing the implications of dropping the regularity assumptions on the unknown traders' valuations cdf. If we drop the continuous cdf assumption, the regret rate degrades to in the full-feedback case, where is the time horizon. If we drop the Lipschitz cdf assumption, learning becomes impossible in the -bit feedback case.
Paper Structure (15 sections, 8 theorems, 32 equations, 1 figure, 1 table, 3 algorithms)

This paper contains 15 sections, 8 theorems, 32 equations, 1 figure, 1 table, 3 algorithms.

Key Result

Lemma 1

If the cdf $F$ of $\nu$ is continuous, then, for any $t \in \mathbb{N}$ and any $p \in [0,1]$, In particular, the function $p \mapsto \mathbb{E}\bigl[ \mathrm{G}_t(p) \bigr]$ is maximized at any point $m \in [0,1]$ such that $F(m) = \frac{1}{2}$.

Figures (1)

  • Figure 1: Qualitative plots of the densities $f_\varepsilon$, $f_{\varepsilon'}$ (left) and corresponding expected rewards (right) used in the proof of \ref{['t:lower-bound-full-M']} for two values $\varepsilon, \textcolor{red}{\varepsilon'} >0$. Note that the difference in reward by posting a price that is optimal for one instance $\varepsilon'$ when the actual instance is $\varepsilon$ is $\Theta\bigl(\left\lvert\varepsilon-\varepsilon'\right\rvert^2\bigr)$.

Theorems & Definitions (15)

  • Lemma 1: The median lemma
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof : Proof sketch.
  • Theorem 3
  • proof : Proof sketch
  • Theorem 4
  • proof : Proof sketch
  • ...and 5 more