Trading Volume Maximization with Online Learning

Tommaso Cesari; Roberto Colomboni

Trading Volume Maximization with Online Learning

Tommaso Cesari, Roberto Colomboni

TL;DR

This work reframes trading-brokerage in OTC markets as an online-learning problem focused on maximizing trading volume rather than gain-from-trade. By leveraging the Median Lemma, it develops two main algorithms—Empirical Median (FEM) for full-feedback and Median Binary Search (MBS) for 2-bit feedback—achieving near-optimal regret under regularity assumptions: $O(\ln T)$ in the full-feedback case with a continuous cdf, and $O(\ln(MT)\ln T)$ in the 2-bit case with an $M$-Lipschitz cdf. The authors provide matching lower bounds and show that removing continuity or Lipschitz assumptions degrades performance to $\Theta(\sqrt{T})$ or makes learning impossible, respectively. The results clarify how regularity shapes learning speed in two-sided online markets and open directions for non-stationary or contextual extensions with potential market-impact implications.

Abstract

We explore brokerage between traders in an online learning framework. At any round $t$, two traders meet to exchange an asset, provided the exchange is mutually beneficial. The broker proposes a trading price, and each trader tries to sell their asset or buy the asset from the other party, depending on whether the price is higher or lower than their private valuations. A trade happens if one trader is willing to sell and the other is willing to buy at the proposed price. Previous work provided guidance to a broker aiming at enhancing traders' total earnings by maximizing the gain from trade, defined as the sum of the traders' net utilities after each interaction. In contrast, we investigate how the broker should behave to maximize the trading volume, i.e., the total number of trades. We model the traders' valuations as an i.i.d. process with an unknown distribution. If the traders' valuations are revealed after each interaction (full-feedback), and the traders' valuations cumulative distribution function (cdf) is continuous, we provide an algorithm achieving logarithmic regret and show its optimality up to constant factors. If only their willingness to sell or buy at the proposed price is revealed after each interaction ($2$-bit feedback), we provide an algorithm achieving poly-logarithmic regret when the traders' valuations cdf is Lipschitz and show that this rate is near-optimal. We complement our results by analyzing the implications of dropping the regularity assumptions on the unknown traders' valuations cdf. If we drop the continuous cdf assumption, the regret rate degrades to $Θ(\sqrt{T})$ in the full-feedback case, where $T$ is the time horizon. If we drop the Lipschitz cdf assumption, learning becomes impossible in the $2$-bit feedback case.

Trading Volume Maximization with Online Learning

TL;DR

in the full-feedback case with a continuous cdf, and

in the 2-bit case with an

-Lipschitz cdf. The authors provide matching lower bounds and show that removing continuity or Lipschitz assumptions degrades performance to

or makes learning impossible, respectively. The results clarify how regularity shapes learning speed in two-sided online markets and open directions for non-stationary or contextual extensions with potential market-impact implications.

Abstract

We explore brokerage between traders in an online learning framework. At any round

, two traders meet to exchange an asset, provided the exchange is mutually beneficial. The broker proposes a trading price, and each trader tries to sell their asset or buy the asset from the other party, depending on whether the price is higher or lower than their private valuations. A trade happens if one trader is willing to sell and the other is willing to buy at the proposed price. Previous work provided guidance to a broker aiming at enhancing traders' total earnings by maximizing the gain from trade, defined as the sum of the traders' net utilities after each interaction. In contrast, we investigate how the broker should behave to maximize the trading volume, i.e., the total number of trades. We model the traders' valuations as an i.i.d. process with an unknown distribution. If the traders' valuations are revealed after each interaction (full-feedback), and the traders' valuations cumulative distribution function (cdf) is continuous, we provide an algorithm achieving logarithmic regret and show its optimality up to constant factors. If only their willingness to sell or buy at the proposed price is revealed after each interaction (

-bit feedback), we provide an algorithm achieving poly-logarithmic regret when the traders' valuations cdf is Lipschitz and show that this rate is near-optimal. We complement our results by analyzing the implications of dropping the regularity assumptions on the unknown traders' valuations cdf. If we drop the continuous cdf assumption, the regret rate degrades to

in the full-feedback case, where

is the time horizon. If we drop the Lipschitz cdf assumption, learning becomes impossible in the

-bit feedback case.

Paper Structure (15 sections, 8 theorems, 32 equations, 1 figure, 1 table, 3 algorithms)

This paper contains 15 sections, 8 theorems, 32 equations, 1 figure, 1 table, 3 algorithms.

Introduction
Setting
Overview of Our Contributions
Techniques and Challenges
Related Work
Limitations
The Median Lemma
Full-Feedback
2-Bit Feedback
Non-Lipschitz or Discontinuous Pdfs
Conclusions and Open Problems
Proof of Theorem \ref{['t:lower-bound-full-M']}
Proof of Theorem \ref{['t:mbs']}
Proof of Theorem \ref{['t:lower-bound-2-bit-M']}
Proof of Theorem \ref{['t:ftpsi']}

Key Result

Lemma 1

If the cdf $F$ of $\nu$ is continuous, then, for any $t \in \mathbb{N}$ and any $p \in [0,1]$, In particular, the function $p \mapsto \mathbb{E}\bigl[ \mathrm{G}_t(p) \bigr]$ is maximized at any point $m \in [0,1]$ such that $F(m) = \frac{1}{2}$.

Figures (1)

Figure 1: Qualitative plots of the densities $f_\varepsilon$, $f_{\varepsilon'}$ (left) and corresponding expected rewards (right) used in the proof of \ref{['t:lower-bound-full-M']} for two values $\varepsilon, \textcolor{red}{\varepsilon'} >0$. Note that the difference in reward by posting a price that is optimal for one instance $\varepsilon'$ when the actual instance is $\varepsilon$ is $\Theta\bigl(\left\lvert\varepsilon-\varepsilon'\right\rvert^2\bigr)$.

Theorems & Definitions (15)

Lemma 1: The median lemma
proof
Theorem 1
proof
Theorem 2
proof : Proof sketch.
Theorem 3
proof : Proof sketch
Theorem 4
proof : Proof sketch
...and 5 more

Trading Volume Maximization with Online Learning

TL;DR

Abstract

Trading Volume Maximization with Online Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (15)