Social Interpretable Reinforcement Learning

Leonardo Lucio Custode; Giovanni Iacca

Social Interpretable Reinforcement Learning

Leonardo Lucio Custode, Giovanni Iacca

TL;DR

This work tackles the interpretability-cost bottleneck in reinforcement learning by introducing Social Interpretable RL (SIRL), a two-phase, socially inspired learning framework. In the collaborative phase, a population of DT-based policies acts in parallel on a shared environment and votes on a single deployed action, after which leaves are updated via Q-learning; in the subsequent individual phase, each DT learns on its own environment to refine its performance, with the outer loop using Grammatical Evolution to optimize DT structures. Empirical results across six Gymnasium benchmarks show that SIRL reduces the number of environment interactions by $43\%-76\%$, accelerates convergence, and often matches or surpasses the baseline interpretable method (ELDT), while preserving interpretability. These findings demonstrate that socially guided learning can substantially raise sample efficiency and policy quality in interpretable RL, narrowing the gap with non-interpretable approaches while maintaining transparency.

Abstract

Reinforcement Learning (RL) bears the promise of being a game-changer in many applications. However, since most of the literature in the field is currently focused on opaque models, the use of RL in high-stakes scenarios, where interpretability is crucial, is still limited. Recently, some approaches to interpretable RL, e.g., based on Decision Trees, have been proposed, but one of the main limitations of these techniques is their training cost. To overcome this limitation, we propose a new method, called Social Interpretable RL (SIRL), that can substantially reduce the number of episodes needed for training. Our method mimics a social learning process, where each agent in a group learns to solve a given task based both on its own individual experience as well as the experience acquired together with its peers. Our approach is divided into the following two phases. (1) In the collaborative phase, all the agents in the population interact with a shared instance of the environment, where each agent observes the state and independently proposes an action. Then, voting is performed to choose the action that will actually be deployed in the environment. (2) In the individual phase, then, each agent refines its individual performance by interacting with its own instance of the environment. This mechanism makes the agents experience a larger number of episodes with little impact on the computational cost of the process. Our results (on 6 widely-known RL benchmarks) show that SIRL not only reduces the computational cost by a factor varying from a minimum of 43% to a maximum 76%, but it also increases the convergence speed and, often, improves the quality of the solutions.

Social Interpretable Reinforcement Learning

TL;DR

, accelerates convergence, and often matches or surpasses the baseline interpretable method (ELDT), while preserving interpretability. These findings demonstrate that socially guided learning can substantially raise sample efficiency and policy quality in interpretable RL, narrowing the gap with non-interpretable approaches while maintaining transparency.

Abstract

Paper Structure (38 sections, 21 equations, 25 figures, 4 tables, 4 algorithms)

This paper contains 38 sections, 21 equations, 25 figures, 4 tables, 4 algorithms.

Introduction
Related work
Social Learning
Interpretable Reinforcement Learning
Method
Results
Conclusions
Acknowledgments.
Hyperparameters
Computational environment
Description of the environments
InvertedPendulum-v2
LunarLander-v2
Swimmer-v2
Reacher-v4
...and 23 more sections

Figures (25)

Figure 1: Graphical representation of the proposed SIRL approach.
Figure 2: Scores (the higher, the better) obtained by the best agent, at each iteration of the outer loop (i.e., a generation), for the population-based training of interpretable agents. The solid line represents the mean value, while the shaded area represents the $95\%$ CI.
Figure 3: Scores (the higher, the better) obtained by the best agents found by each tested method in $10$ runs, tested on $100$ unseen episodes.
Figure 4: Number of episodes used by each of the algorithms under comparison.
Figure 5: Results obtained by applying our social learning approach to hyperparameter optimization for Deep RL. (a) Scores obtained by the best neural network found from the grid search when trained with Deep Q Learning and Social Deep Q Learning. The solid lines represent the mean over $10$ independent runs, while the shaded area represents the $95\%$ CI. (b) Number of episodes simulated with the environment by two different versions of the grid search. "DQN" refers to a grid search using traditional Deep Q learning, while "Social DQN" refers to a grid search using our social learning approach.
...and 20 more figures

Social Interpretable Reinforcement Learning

TL;DR

Abstract

Social Interpretable Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (25)