Structured Reinforcement Learning for Combinatorial Decision-Making
Heiko Hoppe, Léo Baty, Louis Bouvier, Axel Parmentier, Maximilian Schiffer
TL;DR
This work tackles reinforcement learning in combinatorial decision problems by embedding a combinatorial optimization layer into the actor (CO-layer) and enabling end-to-end training through Fenchel-Young losses. It offers a geometric interpretation as a sampling-based primal-dual method on the dual of the moment polytope and pairs this with a stable TD-based critic, including double Q-learning. Across six environments (static and dynamic) SRL matches or exceeds Structured Imitation Learning and unstructured PPO, achieving up to a 92% improvement on dynamic tasks while exhibiting lower variance and faster convergence, at the cost of higher CO-layer driven computation. The approach is well suited for industrial-scale planning with large combinatorial action spaces, where structure and end-to-end learning can yield substantial practical gains.
Abstract
Reinforcement learning (RL) is increasingly applied to real-world problems involving complex and structured decisions, such as routing, scheduling, and assortment planning. These settings challenge standard RL algorithms, which struggle to scale, generalize, and exploit structure in the presence of combinatorial action spaces. We propose Structured Reinforcement Learning (SRL), a novel actor-critic paradigm that embeds combinatorial optimization-layers into the actor neural network. We enable end-to-end learning of the actor via Fenchel-Young losses and provide a geometric interpretation of SRL as a primal-dual algorithm in the dual of the moment polytope. Across six environments with exogenous and endogenous uncertainty, SRL matches or surpasses the performance of unstructured RL and imitation learning on static tasks and improves over these baselines by up to 92% on dynamic problems, with improved stability and convergence speed.
