Advances in Preference-based Reinforcement Learning: A Review
Youssef Abdelkareem, Shady Shehata, Fakhri Karray
TL;DR
This survey addresses the core problem of reward engineering in reinforcement learning by embracing preference-based feedback from humans. It presents a unified PbRL framework that encompasses learning directly a policy or learning a surrogate utility function, with both linear and non-linear (notably deep) representations, and discusses theoretical guarantees including regret and finite-sample bounds. The work surveys benchmarking efforts, and highlights practical applications in NLP, especially text summarization, where PbRL demonstrates improved alignment with human judgments. It also candidly discusses limitations—most notably feedback and sample inefficiency—and outlines future directions in safety, theory, and broader real-world applications. Overall, PbRL emerges as a promising direction for scalable, user-involved RL where explicit reward design is difficult, provided advances in efficiency and robustness continue to mature.
Abstract
Reinforcement Learning (RL) algorithms suffer from the dependency on accurately engineered reward functions to properly guide the learning agents to do the required tasks. Preference-based reinforcement learning (PbRL) addresses that by utilizing human preferences as feedback from the experts instead of numeric rewards. Due to its promising advantage over traditional RL, PbRL has gained more focus in recent years with many significant advances. In this survey, we present a unified PbRL framework to include the newly emerging approaches that improve the scalability and efficiency of PbRL. In addition, we give a detailed overview of the theoretical guarantees and benchmarking work done in the field, while presenting its recent applications in complex real-world tasks. Lastly, we go over the limitations of the current approaches and the proposed future research directions.
