Table of Contents
Fetching ...

Balancing Immediate Revenue and Future Off-Policy Evaluation in Coupon Allocation

Naoki Nishimura, Ken Kobayashi, Kazuhide Nakata

TL;DR

This work proposes a novel approach that combines a model-based revenue maximization policy and a randomized exploration policy for data collection that enables flexible adjustment of the mixture ratio between these two policies to optimize the balance between short-term revenue and future policy improvement.

Abstract

Coupon allocation drives customer purchases and boosts revenue. However, it presents a fundamental trade-off between exploiting the current optimal policy to maximize immediate revenue and exploring alternative policies to collect data for future policy improvement via off-policy evaluation (OPE). To balance this trade-off, we propose a novel approach that combines a model-based revenue maximization policy and a randomized exploration policy for data collection. Our framework enables flexible adjustment of the mixture ratio between these two policies to optimize the balance between short-term revenue and future policy improvement. We formulate the problem of determining the optimal mixture ratio as multi-objective optimization, enabling quantitative evaluation of this trade-off. We empirically verified the effectiveness of the proposed mixed policy using synthetic data. Our main contributions are: (1) Demonstrating a mixed policy combining deterministic and probabilistic policies, flexibly adjusting the data collection vs. revenue trade-off. (2) Formulating the optimal mixture ratio problem as multi-objective optimization, enabling quantitative evaluation of this trade-off.

Balancing Immediate Revenue and Future Off-Policy Evaluation in Coupon Allocation

TL;DR

This work proposes a novel approach that combines a model-based revenue maximization policy and a randomized exploration policy for data collection that enables flexible adjustment of the mixture ratio between these two policies to optimize the balance between short-term revenue and future policy improvement.

Abstract

Coupon allocation drives customer purchases and boosts revenue. However, it presents a fundamental trade-off between exploiting the current optimal policy to maximize immediate revenue and exploring alternative policies to collect data for future policy improvement via off-policy evaluation (OPE). To balance this trade-off, we propose a novel approach that combines a model-based revenue maximization policy and a randomized exploration policy for data collection. Our framework enables flexible adjustment of the mixture ratio between these two policies to optimize the balance between short-term revenue and future policy improvement. We formulate the problem of determining the optimal mixture ratio as multi-objective optimization, enabling quantitative evaluation of this trade-off. We empirically verified the effectiveness of the proposed mixed policy using synthetic data. Our main contributions are: (1) Demonstrating a mixed policy combining deterministic and probabilistic policies, flexibly adjusting the data collection vs. revenue trade-off. (2) Formulating the optimal mixture ratio problem as multi-objective optimization, enabling quantitative evaluation of this trade-off.
Paper Structure (6 sections, 5 equations, 3 figures)

This paper contains 6 sections, 5 equations, 3 figures.

Figures (3)

  • Figure 1: Example of applying mixed data collection policies in coupon allocation using two policies for simplicity.
  • Figure 2: Evaluation policy positively correlated with data collection policies
  • Figure 3: Evaluation policy negatively correlated with data collection policies