From Sequential to Recursive: Enhancing Decision-Focused Learning with Bidirectional Feedback
Xinyu Wang, Jinxiao Du, Yiyang Peng, Wei Ma
TL;DR
This paper addresses the limitations of sequential decision-focused learning (S-DFL) by introducing Recursive DFL (R-DFL), which captures bidirectional feedback between prediction and optimization in closed-loop settings. It develops two gradient propagation approaches—explicit unrolling and implicit differentiation—and proves their gradient equivalence under suitable conditions, highlighting a trade-off between implementation simplicity and computational efficiency. Through extensive experiments on synthetic and real-world tasks (newsvendor and bipartite matching), R-DFL demonstrates substantial improvements in final decision quality over S-DFL and PTO, with implicit differentiation offering notably faster training. The work lays a foundation for truly unified prediction-prescription workflows in dynamic, interactive systems and suggests future extensions to stochastic and integer optimization scenarios.
Abstract
Decision-focused learning (DFL) has emerged as a powerful end-to-end alternative to conventional predict-then-optimize (PTO) pipelines by directly optimizing predictive models through downstream decision losses. Existing DFL frameworks are limited by their strictly sequential structure, referred to as sequential DFL (S-DFL). However, S-DFL fails to capture the bidirectional feedback between prediction and optimization in complex interaction scenarios. In view of this, we first time propose recursive decision-focused learning (R-DFL), a novel framework that introduces bidirectional feedback between downstream optimization and upstream prediction. We further extend two distinct differentiation methods: explicit unrolling via automatic differentiation and implicit differentiation based on fixed-point methods, to facilitate efficient gradient propagation in R-DFL. We rigorously prove that both methods achieve comparable gradient accuracy, with the implicit method offering superior computational efficiency. Extensive experiments on both synthetic and real-world datasets, including the newsvendor problem and the bipartite matching problem, demonstrate that R-DFL not only substantially enhances the final decision quality over sequential baselines but also exhibits robust adaptability across diverse scenarios in closed-loop decision-making problems.
