Table of Contents
Fetching ...

Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning

Zuzanna Osika, Jazmin Zatarain-Salazar, Frans A. Oliehoek, Pradeep K. Murukannaiah

TL;DR

By considering both policy behavior and objective values, this clustering method can reveal the relationship between policy behaviors and regions in the objective space and enable decision makers to identify overarching trends and insights in the solution set rather than examining each policy individually.

Abstract

Multi-objective reinforcement learning (MORL) is used to solve problems involving multiple objectives. An MORL agent must make decisions based on the diverse signals provided by distinct reward functions. Training an MORL agent yields a set of solutions (policies), each presenting distinct trade-offs among the objectives (expected returns). MORL enhances explainability by enabling fine-grained comparisons of policies in the solution set based on their trade-offs as opposed to having a single policy. However, the solution set is typically large and multi-dimensional, where each policy (e.g., a neural network) is represented by its objective values. We propose an approach for clustering the solution set generated by MORL. By considering both policy behavior and objective values, our clustering method can reveal the relationship between policy behaviors and regions in the objective space. This approach can enable decision makers (DMs) to identify overarching trends and insights in the solution set rather than examining each policy individually. We tested our method in four multi-objective environments and found it outperformed traditional k-medoids clustering. Additionally, we include a case study that demonstrates its real-world application.

Navigating Trade-offs: Policy Summarization for Multi-Objective Reinforcement Learning

TL;DR

By considering both policy behavior and objective values, this clustering method can reveal the relationship between policy behaviors and regions in the objective space and enable decision makers to identify overarching trends and insights in the solution set rather than examining each policy individually.

Abstract

Multi-objective reinforcement learning (MORL) is used to solve problems involving multiple objectives. An MORL agent must make decisions based on the diverse signals provided by distinct reward functions. Training an MORL agent yields a set of solutions (policies), each presenting distinct trade-offs among the objectives (expected returns). MORL enhances explainability by enabling fine-grained comparisons of policies in the solution set based on their trade-offs as opposed to having a single policy. However, the solution set is typically large and multi-dimensional, where each policy (e.g., a neural network) is represented by its objective values. We propose an approach for clustering the solution set generated by MORL. By considering both policy behavior and objective values, our clustering method can reveal the relationship between policy behaviors and regions in the objective space. This approach can enable decision makers (DMs) to identify overarching trends and insights in the solution set rather than examining each policy individually. We tested our method in four multi-objective environments and found it outperformed traditional k-medoids clustering. Additionally, we include a case study that demonstrates its real-world application.

Paper Structure

This paper contains 18 sections, 4 equations, 6 figures, 1 table, 1 algorithm.

Figures (6)

  • Figure 1: An outline for our approach which clusters a set of policies, considering clustering quality in both objective and behavior spaces.
  • Figure 2: Screenshot from the highway environment
  • Figure 3: Sankey diagram showing similarity between clusters in objective and behavior space. Each clusters is a node and the links are policies.
  • Figure 4: Clusterings obtained by the PAN clustering (red dots) and iterative k-medoids clustering (blue rectangles).
  • Figure 5: Clusters visualised in the objective space for chosen point in the Figure \ref{['fig:clustering']}
  • ...and 1 more figures