Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges

Giorgio Franceschelli; Mirco Musolesi

Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges

Giorgio Franceschelli, Mirco Musolesi

TL;DR

The paper addresses how Reinforcement Learning can enhance generative AI by providing non-differentiable rewards, alignment with human values, and flexible objective formulations. It introduces a three-way taxonomy—generation with RL as an alternative, objective-driven generation, and shaping non-quantifiable traits via reward modeling—and surveys state-of-the-art methods, including SeqGAN, RLHF, reward modeling, MIXER-like strategies, diffusion-policy optimization, and molecular design via RL. Key contributions include a structured synthesis of existing work, a discussion of domain-specific rewards, and a candid assessment of challenges such as sparse rewards, reward hacking, and the cost of human feedback, with proposed directions like IRL, offline RL, and multi-agent RL. The survey aims to guide researchers and practitioners toward practical integration of RL in generative systems, highlighting the potential impact on text, code, music, image, and chemistry domains while noting significant open research questions and methodological gaps.

Abstract

Generative Artificial Intelligence (AI) is one of the most exciting developments in Computer Science of the last decade. At the same time, Reinforcement Learning (RL) has emerged as a very successful paradigm for a variety of machine learning tasks. In this survey, we discuss the state of the art, opportunities and open research questions in applying RL to generative AI. In particular, we will discuss three types of applications, namely, RL as an alternative way for generation without specified objectives; as a way for generating outputs while concurrently maximizing an objective function; and, finally, as a way of embedding desired characteristics, which cannot be easily captured by means of an objective function, into the generative process. We conclude the survey with an in-depth discussion of the opportunities and challenges in this fascinating emerging area.

Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges

TL;DR

Abstract

Paper Structure (15 sections, 2 figures, 2 tables)

This paper contains 15 sections, 2 figures, 2 tables.

Introduction
Preliminaries
Generative Deep Learning
Deep Reinforcement Learning
RL for Generative AI
RL for Mere Generation
Overview
Discussion
RL for Objective Maximization
Overview
Discussion
RL for Improving Not Easily Quantifiable Characteristics
Overview
Discussion
Conclusion

Figures (2)

Figure 1: The canonical reinforcement learning framework: at each timestep $t$, the Agent performs an action $a_t$ based on the current state $s_t$, which is a representation of the Environment. Upon the execution of the action, the Agent finds itself in a new state $s_{t+1}$, and receives a reward $r_{t+1}$.
Figure 2: The reinforcement learning framework for generative modeling: at each timestep $t$, the Generative Model (i.e., the Agent) generates an action $a_t$ based on the current description of the generated output (i.e., current state) $s_t$, which updates the current description of the generated output to $s_{t+1}$, and receives a reward $r_{t+1}$ related to it.

Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges

TL;DR

Abstract

Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges

Authors

TL;DR

Abstract

Table of Contents

Figures (2)