Conceptual Belief-Informed Reinforcement Learning
Xingrui Gu, Chuyi Jiang, Laixi Shi
TL;DR
The paper addresses RL sample inefficiency by drawing inspiration from cognitive science, proposing Conceptual Belief-Informed RL (HI-RL) which combines conceptual abstraction with adaptive concept-based priors. HI-RL is algorithm-agnostic and introduces two modules—Concept Formation and Conceptual Adaptive Belief—to reformulate past experiences as priors (b_t) and fuse them with current signals via a Smoothed Belief backup (B_t). It provides concrete instantiations (HI-Q, HI-PPO, HI-SAC) and demonstrates consistent gains in sample efficiency and performance across discrete and continuous domains, backed by a formalized problem setup, algorithmic details, and extensive experiments. By enabling belief-guided, concept-aware learning, HI-RL advances toward more human-like, efficient reinforcement learning and suggests a practical path to an Era of Experience.
Abstract
Reinforcement learning (RL) has achieved significant success but is hindered by inefficiency and instability, relying on large amounts of trial-and-error data and failing to efficiently use past experiences to guide decisions. However, humans achieve remarkably efficient learning from experience, attributed to abstracting concepts and updating associated probabilistic beliefs by integrating both uncertainty and prior knowledge, as observed by cognitive science. Inspired by this, we introduce Conceptual Belief-Informed Reinforcement Learning to emulate human intelligence (HI-RL), an efficient experience utilization paradigm that can be directly integrated into existing RL frameworks. HI-RL forms concepts by extracting high-level categories of critical environmental information and then constructs adaptive concept-associated probabilistic beliefs as experience priors to guide value or policy updates. We evaluate HI-RL by integrating it into various existing value- and policy-based algorithms (DQN, PPO, SAC, and TD3) and demonstrate consistent improvements in sample efficiency and performance across both discrete and continuous control benchmarks.
