Online Price Competition under Generalized Linear Demands
Daniele Bracale, Moulinath Banerjee, Cong Shi, Yuekai Sun
TL;DR
This work addresses sequential price competition among N sellers under generalized linear single-index demand models, allowing nonlinear demand patterns beyond the traditional linear setup. It introduces a decentralized policy, PML-GLUCB, that blends penalized MLE parameter updates with an upper-confidence pricing rule, removing the need for coordinated exploration across sellers and accommodating both binary and real-valued demands. The authors establish a near-optimal per-seller regret bound of O(N^2 sqrt(T) log T) and show convergence-like control relative to a dynamic Nash equilibrium, even when play need not converge to NE. A key technical contribution is a novel variant of the elliptical potential lemma tailored to the multi-agent, private-observation setting, enabling robust regret control in this competitive context.
Abstract
We study sequential price competition among $N$ sellers, each influenced by the pricing decisions of their rivals. Specifically, the demand function for each seller $i$ follows the single index model $λ_i(\mathbf{p}) = μ_i(\langle \boldsymbolθ_{i,0}, \mathbf{p} \rangle)$, with known increasing link $μ_i$ and unknown parameter $\boldsymbolθ_{i,0}$, where the vector $\mathbf{p}$ denotes the vector of prices offered by all the sellers simultaneously at a given instant. Each seller observes only their own realized demand -- unobservable to competitors -- and the prices set by rivals. Our framework generalizes existing approaches that focus solely on linear demand models. We propose a novel decentralized policy, PML-GLUCB, that combines penalized MLE with an upper-confidence pricing rule, removing the need for coordinated exploration phases across sellers -- which is integral to previous linear models -- and accommodating both binary and real-valued demand observations. Relative to a dynamic benchmark policy, each seller achieves $O(N^{2}\sqrt{T}\log(T))$ regret, which essentially matches the optimal rate known in the linear setting. A significant technical contribution of our work is the development of a variant of the elliptical potential lemma -- typically applied in single-agent systems -- adapted to our competitive multi-agent environment.
