Graph-Attentive MAPPO for Dynamic Retail Pricing
Krishna Kumar Neelakanta Pillai Santha Kumari Amma
TL;DR
This study tackles dynamic pricing for a portfolio of SKUs by framing it as a multi-agent reinforcement learning problem with cross-product interactions. It introduces MAPPO+GAT, a graph-attentive augmentation of the strong MAPPO baseline, embedding a Graph Attention Network inside the policy/value networks to capture relational structure from a co-purchase graph. Through a data-driven simulator derived from the Online Retail II dataset and a rigorous, variance-aware evaluation protocol, MAPPO+GAT yields meaningful profit gains over MAPPO while maintaining or improving fairness and reducing price volatility. The findings suggest that graph-integrated MARL provides scalable, stable, and practitioner-friendly benefits for multi-product price control in retail settings.
Abstract
Dynamic pricing in retail requires policies that adapt to shifting demand while coordinating decisions across related products. We present a systematic empirical study of multi-agent reinforcement learning for retail price optimization, comparing a strong MAPPO baseline with a graph-attention-augmented variant (MAPPO+GAT) that leverages learned interactions among products. Using a simulated pricing environment derived from real transaction data, we evaluate profit, stability across random seeds, fairness across products, and training efficiency under a standardized evaluation protocol. The results indicate that MAPPO provides a robust and reproducible foundation for portfolio-level price control, and that MAPPO+GAT further enhances performance by sharing information over the product graph without inducing excessive price volatility. These results indicate that graph-integrated MARL provides a more scalable and stable solution than independent learners for dynamic retail pricing, offering practical advantages in multi-product decision-making.
