A Comprehensive Survey on Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

Guiliang Liu; Sheng Xu; Shicheng Liu; Ashish Gaurav; Sriram Ganapathi Subramanian; Pascal Poupart

A Comprehensive Survey on Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

Guiliang Liu, Sheng Xu, Shicheng Liu, Ashish Gaurav, Sriram Ganapathi Subramanian, Pascal Poupart

TL;DR

This survey formalizes Inverse Constrained Reinforcement Learning (ICRL) as the problem of recovering implicit constraints from expert demonstrations within a Constrained Markov Decision Process framework, then reviews a spectrum of methods across deterministic and stochastic environments, limited demonstrations, and multi-agent settings. It analyzes maximum entropy and maximum causal entropy formulations, discusses hard versus soft constraints, and covers Bayesian, variational, data-augmentation, and offline strategies to address epistemic uncertainty in constraint inference. The authors also present approaches for simultaneous reward and constraint learning, constraint inference from multiple experts and multi-agent systems, and they benchmark ICRL methods on grid-world, MuJoCo, and HighD-like realistic environments, highlighting applications in autonomous driving, robotics, healthcare, and sports analytics. Open questions span theoretical identifiability, dynamic and transferable constraints, and real-world deployment, aiming to bridge theory and industrial practice with robust, generalizable constraint inference. $ICRL$ thus provides a comprehensive taxonomy, formal definitions, and practical guidelines to advance safe, interpretable constraint-aware RL systems across diverse domains.

Abstract

Inverse Constrained Reinforcement Learning (ICRL) is the task of inferring the implicit constraints that expert agents adhere to, based on their demonstration data. As an emerging research topic, ICRL has received considerable attention in recent years. This article presents a categorical survey of the latest advances in ICRL. It serves as a comprehensive reference for machine learning researchers and practitioners, as well as starters seeking to comprehend the definitions, advancements, and important challenges in ICRL. We begin by formally defining the problem and outlining the algorithmic framework that facilitates constraint inference across various scenarios. These include deterministic or stochastic environments, environments with limited demonstrations, and multiple agents. For each context, we illustrate the critical challenges and introduce a series of fundamental methods to tackle these issues. This survey encompasses discrete, virtual, and realistic environments for evaluating ICRL agents. We also delve into the most pertinent applications of ICRL, such as autonomous driving, robot control, and sports analytics. To stimulate continuing research, we conclude the survey with a discussion of key unresolved questions in ICRL that can effectively foster a bridge between theoretical understanding and practical industrial applications. The papers referenced in this survey can be found at https://github.com/Jasonxu1225/Awesome-Constraint-Inference-in-RL.

A Comprehensive Survey on Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

TL;DR

thus provides a comprehensive taxonomy, formal definitions, and practical guidelines to advance safe, interpretable constraint-aware RL systems across diverse domains.

Abstract

Paper Structure (40 sections, 62 equations, 9 figures, 4 tables)

This paper contains 40 sections, 62 equations, 9 figures, 4 tables.

Introduction
The Significance of this Survey
Organization of Contents
Background and Notation
Reinforcement Learning
Constrained Reinforcement Learning
Inverse Constrained Reinforcement Learning
Regularizing the Learned Constraints
Related Topics
Inverse Reinforcement Learning
Generalization Differences.
Constraint Inference in Inverse Optimal Control
Constraint Inference in Deterministic Environments
Maximum Entropy ICRL in the Discrete Domain
Maximum Entropy ICRL in the Continuous Domain
...and 25 more sections

Figures (9)

Figure 1: An example of the context-sensitive car distance constraint between vehicles during a merge on the highway. Under proper weather conditions, when vehicle speed is relatively low and traffic congestion is high, the distance between cars can be reduced. However, in adverse weather conditions, when vehicles are moving fast and traffic is sparse, it becomes necessary to increase the distance between cars to ensure safety.
Figure 2: A running example of ICRL, which alternates between policy updates and constraint inference in each round. The expert policy and the imitation policy are represented by the black and blue curves, respectively. The newly inferred constrained region in each round is highlighted in orange, while the constrained region inferred in previous rounds is depicted in gray.
Figure 3: The flowchart of ICRL.
Figure 4: Examples of the ICRL solutions. In these illustrations, the initial location and the final destination are represented by red and blue circles, respectively. The expert demonstrations, signified by dark curves, are directly observable. The three distinct constraints recovered by ICRL algorithms, highlighted as gray regions, provide valid explanations for expert behaviors.
Figure 5: An example showing that generalizing $\Tilde{r}$ and constraint $\mathds{1}^{\mathcal{M}_c}(s^c)=0$ learned in the training environment (Left) to new environment (Right) induce different optimal policies (the red path for $\Tilde{r}$ and the blue for $\mathds{1}^{\mathcal{M}_c}$).
...and 4 more figures

Theorems & Definitions (2)

Definition 5.1
Definition 6.1

A Comprehensive Survey on Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

TL;DR

Abstract

A Comprehensive Survey on Inverse Constrained Reinforcement Learning: Definitions, Progress and Challenges

Authors

TL;DR

Abstract

Table of Contents

Figures (9)

Theorems & Definitions (2)