Bandit Profit-maximization for Targeted Marketing
Joon Suk Huh, Ellen Vitercik, Kirthevasan Kandasamy
TL;DR
The paper addresses profit-maximization across $n$ markets with a common price and market-specific marketing costs, under adversarial, unknown demand curves observed via bandit feedback. It introduces a decomposed learning approach: for monotonic demands, a discretized EXP3-style method maintains a price distribution and per-price cost distributions; for cost-concave demands, a kernelized exponential-weights scheme updates continuous cost distributions. The main results are regret bounds that scale linearly with $n$: $\tilde{O}(nT^{3/4})$ for monotonic demands and $\tilde{O}(nT^{2/3})$ for cost-concave demands, with matching (up to constants) lower bounds $\Omega((nT)^{3/4})$ and $\Omega(nT^{2/3})$, respectively. The work also shows that these algorithms extend to several practical variants (subscription dynamics, promotional credits, and multi-armed profit-maximizing A/B tests), offering scalable, near-optimal performance in multi-market pricing with advertising effects.
Abstract
We study a sequential profit-maximization problem, optimizing for both price and ancillary variables like marketing expenditures. Specifically, we aim to maximize profit over an arbitrary sequence of multiple demand curves, each dependent on a distinct ancillary variable, but sharing the same price. A prototypical example is targeted marketing, where a firm (seller) wishes to sell a product over multiple markets. The firm may invest different marketing expenditures for different markets to optimize customer acquisition, but must maintain the same price across all markets. Moreover, markets may have heterogeneous demand curves, each responding to prices and marketing expenditures differently. The firm's objective is to maximize its gross profit, the total revenue minus marketing costs. Our results are near-optimal algorithms for this class of problems in an adversarial bandit setting, where demand curves are arbitrary non-adaptive sequences, and the firm observes only noisy evaluations of chosen points on the demand curves. For $n$ demand curves (markets), we prove a regret upper bound of $\tilde{O}(nT^{3/4})$ and a lower bound of $Ω((nT)^{3/4})$ for monotonic demand curves, and a regret bound of $\tildeΘ(nT^{2/3})$ for demands curves that are monotonic in price and concave in the ancillary variables.
