Table of Contents
Fetching ...

Online Price Competition under Generalized Linear Demands

Daniele Bracale, Moulinath Banerjee, Cong Shi, Yuekai Sun

TL;DR

This work addresses sequential price competition among N sellers under generalized linear single-index demand models, allowing nonlinear demand patterns beyond the traditional linear setup. It introduces a decentralized policy, PML-GLUCB, that blends penalized MLE parameter updates with an upper-confidence pricing rule, removing the need for coordinated exploration across sellers and accommodating both binary and real-valued demands. The authors establish a near-optimal per-seller regret bound of O(N^2 sqrt(T) log T) and show convergence-like control relative to a dynamic Nash equilibrium, even when play need not converge to NE. A key technical contribution is a novel variant of the elliptical potential lemma tailored to the multi-agent, private-observation setting, enabling robust regret control in this competitive context.

Abstract

We study sequential price competition among $N$ sellers, each influenced by the pricing decisions of their rivals. Specifically, the demand function for each seller $i$ follows the single index model $λ_i(\mathbf{p}) = μ_i(\langle \boldsymbolθ_{i,0}, \mathbf{p} \rangle)$, with known increasing link $μ_i$ and unknown parameter $\boldsymbolθ_{i,0}$, where the vector $\mathbf{p}$ denotes the vector of prices offered by all the sellers simultaneously at a given instant. Each seller observes only their own realized demand -- unobservable to competitors -- and the prices set by rivals. Our framework generalizes existing approaches that focus solely on linear demand models. We propose a novel decentralized policy, PML-GLUCB, that combines penalized MLE with an upper-confidence pricing rule, removing the need for coordinated exploration phases across sellers -- which is integral to previous linear models -- and accommodating both binary and real-valued demand observations. Relative to a dynamic benchmark policy, each seller achieves $O(N^{2}\sqrt{T}\log(T))$ regret, which essentially matches the optimal rate known in the linear setting. A significant technical contribution of our work is the development of a variant of the elliptical potential lemma -- typically applied in single-agent systems -- adapted to our competitive multi-agent environment.

Online Price Competition under Generalized Linear Demands

TL;DR

This work addresses sequential price competition among N sellers under generalized linear single-index demand models, allowing nonlinear demand patterns beyond the traditional linear setup. It introduces a decentralized policy, PML-GLUCB, that blends penalized MLE parameter updates with an upper-confidence pricing rule, removing the need for coordinated exploration across sellers and accommodating both binary and real-valued demands. The authors establish a near-optimal per-seller regret bound of O(N^2 sqrt(T) log T) and show convergence-like control relative to a dynamic Nash equilibrium, even when play need not converge to NE. A key technical contribution is a novel variant of the elliptical potential lemma tailored to the multi-agent, private-observation setting, enabling robust regret control in this competitive context.

Abstract

We study sequential price competition among sellers, each influenced by the pricing decisions of their rivals. Specifically, the demand function for each seller follows the single index model , with known increasing link and unknown parameter , where the vector denotes the vector of prices offered by all the sellers simultaneously at a given instant. Each seller observes only their own realized demand -- unobservable to competitors -- and the prices set by rivals. Our framework generalizes existing approaches that focus solely on linear demand models. We propose a novel decentralized policy, PML-GLUCB, that combines penalized MLE with an upper-confidence pricing rule, removing the need for coordinated exploration phases across sellers -- which is integral to previous linear models -- and accommodating both binary and real-valued demand observations. Relative to a dynamic benchmark policy, each seller achieves regret, which essentially matches the optimal rate known in the linear setting. A significant technical contribution of our work is the development of a variant of the elliptical potential lemma -- typically applied in single-agent systems -- adapted to our competitive multi-agent environment.

Paper Structure

This paper contains 28 sections, 8 theorems, 119 equations, 1 algorithm.

Key Result

Lemma 2.6

Under assumption_mu, for every $i \in [N]$, $\eta_i^{(t)}$ is $L_{\mu_i}$-subgaussian conditionally on $\mathcal{H}_i^{(t-1)}$.

Theorems & Definitions (14)

  • Example 2.2: Binary Response Model
  • Example 2.3: Linear Regression Model
  • Lemma 2.6
  • Remark 2.7: Impossibility to learn competitors’ models
  • Lemma 2.9: Existence of NE
  • Lemma 2.11: Contraction of the best-response operator
  • Theorem 4.1: Regret of $\tt PML - GLUCB$
  • proof : Proof Sketch
  • Proposition 4.2
  • Remark 4.3: Optimality of the $\tt PML - GLUCB$ algorithm
  • ...and 4 more