Towards a Research Community in Interpretable Reinforcement Learning: the InterpPol Workshop
Hector Kohler, Quentin Delfosse, Paul Festor, Philippe Preux
TL;DR
This paper argues for interpretable reinforcement learning by highlighting limitations of post-hoc explainability and the need for intrinsically interpretable policies built on semantically meaningful state representations. It surveys approaches that yield interpretable policies, contrasting imitation-based methods using trees or programs with direct RL strategies, while stressing the importance of formal interpretability definitions, metrics, and interoperable benchmarks. The authors propose InterpPol, the first dedicated workshop to crystallize an IRL community, detailing core topics, submission logistics, and scheduling to foster collaboration and standardization. They also outline plans for an ongoing open community (e.g., Google Group and online seminars) to advance research, benchmarks, and best practices in interpretable RL beyond the workshop.
Abstract
Embracing the pursuit of intrinsically explainable reinforcement learning raises crucial questions: what distinguishes explainability from interpretability? Should explainable and interpretable agents be developed outside of domains where transparency is imperative? What advantages do interpretable policies offer over neural networks? How can we rigorously define and measure interpretability in policies, without user studies? What reinforcement learning paradigms,are the most suited to develop interpretable agents? Can Markov Decision Processes integrate interpretable state representations? In addition to motivate an Interpretable RL community centered around the aforementioned questions, we propose the first venue dedicated to Interpretable RL: the InterpPol Workshop.
