Table of Contents
Fetching ...

Graph-Attentive MAPPO for Dynamic Retail Pricing

Krishna Kumar Neelakanta Pillai Santha Kumari Amma

TL;DR

This study tackles dynamic pricing for a portfolio of SKUs by framing it as a multi-agent reinforcement learning problem with cross-product interactions. It introduces MAPPO+GAT, a graph-attentive augmentation of the strong MAPPO baseline, embedding a Graph Attention Network inside the policy/value networks to capture relational structure from a co-purchase graph. Through a data-driven simulator derived from the Online Retail II dataset and a rigorous, variance-aware evaluation protocol, MAPPO+GAT yields meaningful profit gains over MAPPO while maintaining or improving fairness and reducing price volatility. The findings suggest that graph-integrated MARL provides scalable, stable, and practitioner-friendly benefits for multi-product price control in retail settings.

Abstract

Dynamic pricing in retail requires policies that adapt to shifting demand while coordinating decisions across related products. We present a systematic empirical study of multi-agent reinforcement learning for retail price optimization, comparing a strong MAPPO baseline with a graph-attention-augmented variant (MAPPO+GAT) that leverages learned interactions among products. Using a simulated pricing environment derived from real transaction data, we evaluate profit, stability across random seeds, fairness across products, and training efficiency under a standardized evaluation protocol. The results indicate that MAPPO provides a robust and reproducible foundation for portfolio-level price control, and that MAPPO+GAT further enhances performance by sharing information over the product graph without inducing excessive price volatility. These results indicate that graph-integrated MARL provides a more scalable and stable solution than independent learners for dynamic retail pricing, offering practical advantages in multi-product decision-making.

Graph-Attentive MAPPO for Dynamic Retail Pricing

TL;DR

This study tackles dynamic pricing for a portfolio of SKUs by framing it as a multi-agent reinforcement learning problem with cross-product interactions. It introduces MAPPO+GAT, a graph-attentive augmentation of the strong MAPPO baseline, embedding a Graph Attention Network inside the policy/value networks to capture relational structure from a co-purchase graph. Through a data-driven simulator derived from the Online Retail II dataset and a rigorous, variance-aware evaluation protocol, MAPPO+GAT yields meaningful profit gains over MAPPO while maintaining or improving fairness and reducing price volatility. The findings suggest that graph-integrated MARL provides scalable, stable, and practitioner-friendly benefits for multi-product price control in retail settings.

Abstract

Dynamic pricing in retail requires policies that adapt to shifting demand while coordinating decisions across related products. We present a systematic empirical study of multi-agent reinforcement learning for retail price optimization, comparing a strong MAPPO baseline with a graph-attention-augmented variant (MAPPO+GAT) that leverages learned interactions among products. Using a simulated pricing environment derived from real transaction data, we evaluate profit, stability across random seeds, fairness across products, and training efficiency under a standardized evaluation protocol. The results indicate that MAPPO provides a robust and reproducible foundation for portfolio-level price control, and that MAPPO+GAT further enhances performance by sharing information over the product graph without inducing excessive price volatility. These results indicate that graph-integrated MARL provides a more scalable and stable solution than independent learners for dynamic retail pricing, offering practical advantages in multi-product decision-making.

Paper Structure

This paper contains 40 sections, 8 equations, 7 figures.

Figures (7)

  • Figure 1: Mean test profit ($\pm$95% CI) across seeds for MAPPO and MAPPO+GAT.
  • Figure 2: Histogram of paired profit differences (GAT $-$ MAPPO) over seeds (CRN-paired episodes).
  • Figure 3: Per-seed paired difference (GAT $-$ MAPPO) in test profit; solid line = mean, dashed = zero.
  • Figure 4: Seed-wise win/loss/tie counts under CRN pairing (wins when GAT > MAPPO).
  • Figure 5: Per-seed Jain stability (GAT $-$ MAPPO); positive values indicate improved fairness.
  • ...and 2 more figures