Table of Contents
Fetching ...

Opinion-Guided Reinforcement Learning

Kyanna Dagenais, Istvan David

TL;DR

This work addresses the challenge of incorporating human opinions, with intrinsic epistemic uncertainty, into reinforcement learning. It introduces an end-to-end method that uses subjective logic to model advisors’ beliefs and uncertainty, translates domain-specific advice into opinions, fuses them with the agent’s policy via a Belief Constraint Fusion operator, and maps the result back to the probability domain for action selection. The approach is demonstrated on a Frozen Lake-style grid world with a GridWorld DSL for expressing advice, and is evaluated across oracle, single human, and cooperative human advisors under varying uncertainty and advice quotas. Results show that opinion-guided RL yields higher rewards, faster learning, and more thorough exploration, with human advice approaching or even surpassing oracle performance under several settings. The findings highlight the practical potential of uncertainty-aware guidance in RL and outline open challenges, including interactive advising, DSL design, and extension to value-based and deep RL regimes.

Abstract

Human guidance is often desired in reinforcement learning to improve the performance of the learning agent. However, human insights are often mere opinions and educated guesses rather than well-formulated arguments. While opinions are subject to uncertainty, e.g., due to partial informedness or ignorance about a problem, they also emerge earlier than hard evidence can be produced. Thus, guiding reinforcement learning agents by way of opinions offers the potential for more performant learning processes, but comes with the challenge of modeling and managing opinions in a formal way. In this article, we present a method to guide reinforcement learning agents through opinions. To this end, we provide an end-to-end method to model and manage advisors' opinions. To assess the utility of the approach, we evaluate it with synthetic (oracle) and human advisors, at different levels of uncertainty, and under multiple advice strategies. Our results indicate that opinions, even if uncertain, improve the performance of reinforcement learning agents, resulting in higher rewards, more efficient exploration, and a better reinforced policy. Although we demonstrate our approach through a two-dimensional topological running example, our approach is applicable to complex problems with higher dimensions as well.

Opinion-Guided Reinforcement Learning

TL;DR

This work addresses the challenge of incorporating human opinions, with intrinsic epistemic uncertainty, into reinforcement learning. It introduces an end-to-end method that uses subjective logic to model advisors’ beliefs and uncertainty, translates domain-specific advice into opinions, fuses them with the agent’s policy via a Belief Constraint Fusion operator, and maps the result back to the probability domain for action selection. The approach is demonstrated on a Frozen Lake-style grid world with a GridWorld DSL for expressing advice, and is evaluated across oracle, single human, and cooperative human advisors under varying uncertainty and advice quotas. Results show that opinion-guided RL yields higher rewards, faster learning, and more thorough exploration, with human advice approaching or even surpassing oracle performance under several settings. The findings highlight the practical potential of uncertainty-aware guidance in RL and outline open challenges, including interactive advising, DSL design, and extension to value-based and deep RL regimes.

Abstract

Human guidance is often desired in reinforcement learning to improve the performance of the learning agent. However, human insights are often mere opinions and educated guesses rather than well-formulated arguments. While opinions are subject to uncertainty, e.g., due to partial informedness or ignorance about a problem, they also emerge earlier than hard evidence can be produced. Thus, guiding reinforcement learning agents by way of opinions offers the potential for more performant learning processes, but comes with the challenge of modeling and managing opinions in a formal way. In this article, we present a method to guide reinforcement learning agents through opinions. To this end, we provide an end-to-end method to model and manage advisors' opinions. To assess the utility of the approach, we evaluate it with synthetic (oracle) and human advisors, at different levels of uncertainty, and under multiple advice strategies. Our results indicate that opinions, even if uncertain, improve the performance of reinforcement learning agents, resulting in higher rewards, more efficient exploration, and a better reinforced policy. Although we demonstrate our approach through a two-dimensional topological running example, our approach is applicable to complex problems with higher dimensions as well.
Paper Structure (124 sections, 18 equations, 23 figures, 10 tables, 2 algorithms)

This paper contains 124 sections, 18 equations, 23 figures, 10 tables, 2 algorithms.

Figures (23)

  • Figure 1: The Frozen Lake running example
  • Figure 2: Reinforcement learning -- conceptual overview sutton2018reinforcement
  • Figure 3: Overview of the approach
  • Figure 4: A visual intuition of the advisor's limited knowledge, subject to epistemic uncertainty (left), and the corresponding uncertainty levels of the cells in the grid world. Uncertainty grows with distance---in this specific example, with topological distance. Using the distance between the Advisor and the location the advice pertains to, the uncertainty of advice can be calibrated.
  • Figure 5: The result of policy shaping in the running example with the major changes highlighted. Red: decreased probability; green: increased probability.
  • ...and 18 more figures

Theorems & Definitions (2)

  • Definition 1: Neighboring states
  • Definition 2: Neighborhood (of a state)