Table of Contents
Fetching ...

Transfer Learning for Contextual Joint Assortment-Pricing under Cross-Market Heterogeneity

Elynn Chen, Xi Chen, Yi Zhang

Abstract

We study transfer learning for contextual joint assortment-pricing under a multinomial logit choice model with bandit feedback. A seller operates across multiple related markets and observes only posted prices and realized purchases. While data from source markets can accelerate learning in a target market, cross-market differences in customer preferences may introduce systematic bias if pooled indiscriminately. We model heterogeneity through a structured utility shift, where markets share a common contextual utility structure but differ along a sparse set of latent preference coordinates. Building on this, we develop Transfer Joint Assortment-Pricing (TJAP), a bias-aware framework that combines aggregate-then-debias estimation with a UCB-style policy. TJAP constructs two-radius confidence bounds that separately capture statistical uncertainty and transfer-induced bias, uniformly over continuous prices. We establish matching minimax regret bounds of order $\tilde{O}\!\left(d\sqrt{\frac{T}{1+H}} + s_0\sqrt{T}\right),$revealing a transparent variance-bias tradeoff: transfer accelerates learning along shared preference directions, while heterogeneous components impose an irreducible adaptation cost. Numerical experiments corroborate the theory, showing that TJAP outperforms both target-only learning and naive pooling while remaining robust to cross-market differences.

Transfer Learning for Contextual Joint Assortment-Pricing under Cross-Market Heterogeneity

Abstract

We study transfer learning for contextual joint assortment-pricing under a multinomial logit choice model with bandit feedback. A seller operates across multiple related markets and observes only posted prices and realized purchases. While data from source markets can accelerate learning in a target market, cross-market differences in customer preferences may introduce systematic bias if pooled indiscriminately. We model heterogeneity through a structured utility shift, where markets share a common contextual utility structure but differ along a sparse set of latent preference coordinates. Building on this, we develop Transfer Joint Assortment-Pricing (TJAP), a bias-aware framework that combines aggregate-then-debias estimation with a UCB-style policy. TJAP constructs two-radius confidence bounds that separately capture statistical uncertainty and transfer-induced bias, uniformly over continuous prices. We establish matching minimax regret bounds of order revealing a transparent variance-bias tradeoff: transfer accelerates learning along shared preference directions, while heterogeneous components impose an irreducible adaptation cost. Numerical experiments corroborate the theory, showing that TJAP outperforms both target-only learning and naive pooling while remaining robust to cross-market differences.
Paper Structure (122 sections, 21 theorems, 280 equations, 4 figures, 1 algorithm)

This paper contains 122 sections, 21 theorems, 280 equations, 4 figures, 1 algorithm.

Key Result

Theorem 6

Under Assumptions assump:task-simi–assump:Homo-Cov, there exist constants $C_v, C_b > 0$, depending only on $L_0, C_{\min}, C_{\max}, \kappa, r$, such that with probability at least $1 - \eta$, for every episode $m$,

Figures (4)

  • Figure 1: Schematic illustration of Algorithm \ref{['algo:trans-assort-price-o2o']} over two episodes. At the start of episode $m$, the estimate $\widehat{\boldsymbol{\nu}}_{m-1}$ is computed from episode $m\!-\!1$ data via the aggregate-then-debias procedure (Section \ref{['ssec:agg_then_debias']}). During episode $m$, Fisher information is accumulated via per-period increments $I^{(h)}_t(\widehat{\boldsymbol{\nu}}_{m-1})$, updating the rolling matrices $V^{(h)}_t$ and yielding the pooled matrix $W_{m}$ (Section \ref{['ssec:info_geometry']}). The geometry from the previous episode, $W_{m-1}$, is used to construct the UCB bonus throughout episode $m$ (Section \ref{['ssec:opt_decision_rule']}). The policy follows optimistic selection by maximizing $\widetilde{R}_t(\cdot)$ (Section \ref{['ssec:opt_decision_rule']}). A forced-exploration is considered only in the final $q_{m-1}$ periods, when the target curvature (via $V^{(0)}_t$) is insufficient.
  • Figure 2: Cumulative regret on synthetic instances under varying feature dimension $d$, sparsity level $s_0$, and catalog size $N$. Top row: TJAP with $H\in\{0,1,3,5\}$ compared against CAP, M3P, and ONS--MPP. Bottom row: TJAP with $H\in\{0,1,3,5\}$ compared against the pooled estimator Pool$(H)$ for $H\in\{1,3,5\}$. Each curve is averaged over $10$ independent runs; all methods share the same price range $[0,\overline P]$ and observe identical contexts.
  • Figure 3: Cumulative regret on synthetic instances under varying feature dimension $d$, sparsity level $s_0$, and catalog size $N$. Top row: TJAP with $H\in\{0,1,3,5\}$ compared against CAP, M3P, and ONS--MPP. Bottom row: TJAP with $H\in\{0,1,3,5\}$ compared against the pooled estimator Pool$(H)$ for $H\in\{1,3,5\}$. Each curve is averaged over $10$ independent runs; all methods share the same price range $[0,\overline P]$ and observe identical contexts.
  • Figure :

Theorems & Definitions (24)

  • Theorem 6: Statistical Error Bound
  • Theorem 7: Regret Upper Bound
  • Theorem 8: Minimax lower bound
  • Lemma \ref{lem:SN-MNL-main}: Self-normalized concentration
  • Lemma \ref{lem:debias-main}: Target-only debiasing
  • Lemma \ref{lem:price-optimism}: Revenue optimism
  • Lemma 9: Variance radius $\alpha_{m-1}$
  • Lemma 10: Pooled Fisher growth
  • Definition 11: Target-only restricted eigenvalue
  • Lemma 12: Uniform target restricted eigenvalue
  • ...and 14 more