Reinforcement Learning for Generative AI: A Survey

Yuanjiang Cao; Quan Z. Sheng; Julian McAuley; Lina Yao

Reinforcement Learning for Generative AI: A Survey

Yuanjiang Cao, Quan Z. Sheng, Julian McAuley, Lina Yao

TL;DR

This survey analyzes how reinforcement learning can augment generative AI by addressing the limitations of maximum likelihood training. It develops a unified taxonomy of RL methods (model-free, model-based, sampling, NAS) and catalogs a wide range of applications from NLP and code to vision, speech, and AI for science, with emphasis on RLHF and reward-driven signals. Key contributions include organizing theoretical foundations, practical reward designs (discriminator-based, rule-based, distributional, data-driven), and a comprehensive survey of large-language-model alignment and multi-turn RL. The work highlights challenges such as sparse rewards and long-horizon credit assignment, and discusses future directions for scalable, human-aligned, and architecture-aware RL in generative AI.

Abstract

Deep Generative AI has been a long-standing essential topic in the machine learning community, which can impact a number of application areas like text generation and computer vision. The major paradigm to train a generative model is maximum likelihood estimation, which pushes the learner to capture and approximate the target data distribution by decreasing the divergence between the model distribution and the target distribution. This formulation successfully establishes the objective of generative tasks, while it is incapable of satisfying all the requirements that a user might expect from a generative model. Reinforcement learning, serving as a competitive option to inject new training signals by creating new objectives that exploit novel signals, has demonstrated its power and flexibility to incorporate human inductive bias from multiple angles, such as adversarial learning, hand-designed rules and learned reward model to build a performant model. Thereby, reinforcement learning has become a trending research field and has stretched the limits of generative AI in both model design and application. It is reasonable to summarize and conclude advances in recent years with a comprehensive review. Although there are surveys in different application areas recently, this survey aims to shed light on a high-level review that spans a range of application areas. We provide a rigorous taxonomy in this area and make sufficient coverage on various models and applications. Notably, we also surveyed the fast-developing large language model area. We conclude this survey by showing the potential directions that might tackle the limit of current models and expand the frontiers for generative AI.

Reinforcement Learning for Generative AI: A Survey

TL;DR

Abstract

Paper Structure (52 sections, 21 equations, 4 figures, 9 tables)

This paper contains 52 sections, 21 equations, 4 figures, 9 tables.

Introduction
Preliminary and Background
Generative Models
Reinforcement Learning Methods
Markov Decision Process
Model-free Methods
Model-based Methods
Comparison Between Reinforcement Methods
Framing Generation tasks as Reinforcement Learning Problem
Benefits of RL-based Generative Models
Solving the non-differentiable learning problems
The generated variable is non-differentiable
The training objective is non-differentiable
Introducing new training signal
Reward by Discriminator
...and 37 more sections

Figures (4)

Figure 1: The Overview Structure of This Survey
Figure 2: (a) Framing generation tasks as reinforcement learning framework; (b) The reward computation is non-differentiable.
Figure 3: RL can introduce new signals by flexible reward functions
Figure 4: RL can work as a sampler for models that are hard to sample such as Energy-based Models. The marginal distribution is difficult to sample for the high cost, RL-based agent provides an alternative way to generate sample sequences. THe dashed line represents the potential high cost blocks the generation.

Reinforcement Learning for Generative AI: A Survey

TL;DR

Abstract

Reinforcement Learning for Generative AI: A Survey

Authors

TL;DR

Abstract

Table of Contents

Figures (4)