Table of Contents
Fetching ...

RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods

Raghav Sharma, Manan Mehta, Sai Tiger Raina

TL;DR

This survey analyzes Reinforcement Learning from Human Feedback (RLHF) as the primary mechanism for aligning large language models, then expands the scope to multi-modal alignment, cultural fairness, and low-latency optimization. It surveys foundational algorithms (PPO, DPO, GRPO) and introduces a suite of new frontiers, including Align-Pro, DiffPO, RRPO, CultureSPA, Debate-Norm, RLHF-CML, ALOE, STE, GR-DPO, and Panacea/Hierarchical-Experts, each addressing concrete gaps with theoretical guarantees, novel workflows, and empirical results. The work highlights significant gains in efficiency, robustness, and cultural inclusivity, while identifying persistent challenges in grounding, fairness, latency, and evaluator stability. By advocating unified benchmarks and transparent reward pipelines, the paper outlines practical roadmaps for deploying safer, fairer, and more scalable RLHF-enabled systems across modalities and languages.

Abstract

Reinforcement Learning from Human Feedback (RLHF) is the standard for aligning Large Language Models (LLMs), yet recent progress has moved beyond canonical text-based methods. This survey synthesizes the new frontier of alignment research by addressing critical gaps in multi-modal alignment, cultural fairness, and low-latency optimization. To systematically explore these domains, we first review foundational algo- rithms, including PPO, DPO, and GRPO, before presenting a detailed analysis of the latest innovations. By providing a comparative synthesis of these techniques and outlining open challenges, this work serves as an essential roadmap for researchers building more robust, efficient, and equitable AI systems.

RLHF: A comprehensive Survey for Cultural, Multimodal and Low Latency Alignment Methods

TL;DR

This survey analyzes Reinforcement Learning from Human Feedback (RLHF) as the primary mechanism for aligning large language models, then expands the scope to multi-modal alignment, cultural fairness, and low-latency optimization. It surveys foundational algorithms (PPO, DPO, GRPO) and introduces a suite of new frontiers, including Align-Pro, DiffPO, RRPO, CultureSPA, Debate-Norm, RLHF-CML, ALOE, STE, GR-DPO, and Panacea/Hierarchical-Experts, each addressing concrete gaps with theoretical guarantees, novel workflows, and empirical results. The work highlights significant gains in efficiency, robustness, and cultural inclusivity, while identifying persistent challenges in grounding, fairness, latency, and evaluator stability. By advocating unified benchmarks and transparent reward pipelines, the paper outlines practical roadmaps for deploying safer, fairer, and more scalable RLHF-enabled systems across modalities and languages.

Abstract

Reinforcement Learning from Human Feedback (RLHF) is the standard for aligning Large Language Models (LLMs), yet recent progress has moved beyond canonical text-based methods. This survey synthesizes the new frontier of alignment research by addressing critical gaps in multi-modal alignment, cultural fairness, and low-latency optimization. To systematically explore these domains, we first review foundational algo- rithms, including PPO, DPO, and GRPO, before presenting a detailed analysis of the latest innovations. By providing a comparative synthesis of these techniques and outlining open challenges, this work serves as an essential roadmap for researchers building more robust, efficient, and equitable AI systems.

Paper Structure

This paper contains 65 sections, 12 equations, 1 figure, 9 tables.

Figures (1)

  • Figure 1: Radar plot comparing DPO (baseline), DiffPO, RRPO, and CultureSPA across five axes.