Table of Contents
Fetching ...

Bandit Profit-maximization for Targeted Marketing

Joon Suk Huh, Ellen Vitercik, Kirthevasan Kandasamy

TL;DR

The paper addresses profit-maximization across $n$ markets with a common price and market-specific marketing costs, under adversarial, unknown demand curves observed via bandit feedback. It introduces a decomposed learning approach: for monotonic demands, a discretized EXP3-style method maintains a price distribution and per-price cost distributions; for cost-concave demands, a kernelized exponential-weights scheme updates continuous cost distributions. The main results are regret bounds that scale linearly with $n$: $\tilde{O}(nT^{3/4})$ for monotonic demands and $\tilde{O}(nT^{2/3})$ for cost-concave demands, with matching (up to constants) lower bounds $\Omega((nT)^{3/4})$ and $\Omega(nT^{2/3})$, respectively. The work also shows that these algorithms extend to several practical variants (subscription dynamics, promotional credits, and multi-armed profit-maximizing A/B tests), offering scalable, near-optimal performance in multi-market pricing with advertising effects.

Abstract

We study a sequential profit-maximization problem, optimizing for both price and ancillary variables like marketing expenditures. Specifically, we aim to maximize profit over an arbitrary sequence of multiple demand curves, each dependent on a distinct ancillary variable, but sharing the same price. A prototypical example is targeted marketing, where a firm (seller) wishes to sell a product over multiple markets. The firm may invest different marketing expenditures for different markets to optimize customer acquisition, but must maintain the same price across all markets. Moreover, markets may have heterogeneous demand curves, each responding to prices and marketing expenditures differently. The firm's objective is to maximize its gross profit, the total revenue minus marketing costs. Our results are near-optimal algorithms for this class of problems in an adversarial bandit setting, where demand curves are arbitrary non-adaptive sequences, and the firm observes only noisy evaluations of chosen points on the demand curves. For $n$ demand curves (markets), we prove a regret upper bound of $\tilde{O}(nT^{3/4})$ and a lower bound of $Ω((nT)^{3/4})$ for monotonic demand curves, and a regret bound of $\tildeΘ(nT^{2/3})$ for demands curves that are monotonic in price and concave in the ancillary variables.

Bandit Profit-maximization for Targeted Marketing

TL;DR

The paper addresses profit-maximization across markets with a common price and market-specific marketing costs, under adversarial, unknown demand curves observed via bandit feedback. It introduces a decomposed learning approach: for monotonic demands, a discretized EXP3-style method maintains a price distribution and per-price cost distributions; for cost-concave demands, a kernelized exponential-weights scheme updates continuous cost distributions. The main results are regret bounds that scale linearly with : for monotonic demands and for cost-concave demands, with matching (up to constants) lower bounds and , respectively. The work also shows that these algorithms extend to several practical variants (subscription dynamics, promotional credits, and multi-armed profit-maximizing A/B tests), offering scalable, near-optimal performance in multi-market pricing with advertising effects.

Abstract

We study a sequential profit-maximization problem, optimizing for both price and ancillary variables like marketing expenditures. Specifically, we aim to maximize profit over an arbitrary sequence of multiple demand curves, each dependent on a distinct ancillary variable, but sharing the same price. A prototypical example is targeted marketing, where a firm (seller) wishes to sell a product over multiple markets. The firm may invest different marketing expenditures for different markets to optimize customer acquisition, but must maintain the same price across all markets. Moreover, markets may have heterogeneous demand curves, each responding to prices and marketing expenditures differently. The firm's objective is to maximize its gross profit, the total revenue minus marketing costs. Our results are near-optimal algorithms for this class of problems in an adversarial bandit setting, where demand curves are arbitrary non-adaptive sequences, and the firm observes only noisy evaluations of chosen points on the demand curves. For demand curves (markets), we prove a regret upper bound of and a lower bound of for monotonic demand curves, and a regret bound of for demands curves that are monotonic in price and concave in the ancillary variables.
Paper Structure (63 sections, 33 theorems, 140 equations, 2 figures, 1 table, 2 algorithms)

This paper contains 63 sections, 33 theorems, 140 equations, 2 figures, 1 table, 2 algorithms.

Key Result

Theorem 3.1

For any $\eta>0$ and $K\in\mathbb{N}$, when $\gamma:=\eta$, the regret eq:regret of Algorithm alg:1 satisfies, By choosing $K\in\Theta(T^{1/4})$ and $\eta\in\Theta(T^{-3/4})$, Algorithm alg:1 guarantees $R_T\in\mathcal{O}(nT^{3/4}\log T)$.

Figures (2)

  • Figure 1: A landscape of profit-maximization problems. In (a), we wish to maximize revenue under some demand curve $d(p)$, which boils down to choosing a price $p$ which maximizes $p\cdot d(p)$. In (b), advertising can shift the demand curve, and the goal is to maximize the profit $p\cdot d(c,p) - c$, i.e., revenue minus advertising cost $c$. The setting of this work is illustrated in (c), where we have $n$ different markets, and we wish to choose advertising costs $c_1,\dots,c_n$ and a common price$p$ to maximize the total profit $\sum_{i}p\cdot d_i(c_i,p) - c_i$. The demand curves are unknown to a priori, and we are interested in learning the optimal price and costs via repeated interactions.
  • Figure 2: Illustrations of our baseline and alternative environments. (a) The black line with dots depicts $\mathop{\mathrm{\mathbb{E}}}\limits[b]$ for the baseline defined in \ref{['eq:base-environment']}, and the red line represents the function $2c$. (b) The black line with dots depicts $\mathbb{P}(v\geq p)=\mathop{\mathrm{\mathbb{E}}}\limits\!\left[ \mathbbm{1}[v\geq p] \right]$ for the baseline given in \ref{['eq:base-environment']}, and the red line follows $(2p)^{-1}$.

Theorems & Definitions (51)

  • Theorem 3.1
  • Lemma 3.1
  • Lemma 3.2
  • Lemma 3.3: Comparator loss bound
  • proof : Proof of Lemma \ref{['lem:alg1-comparator-loss-bound']}
  • Lemma 3.4
  • Lemma 3.5
  • Lemma 3.6
  • Theorem 3.2
  • Lemma 3.7
  • ...and 41 more