Table of Contents
Fetching ...

Understanding and Alleviating Memory Consumption in RLHF for LLMs

Jin Zhou, Hanmei Yang, Steven, Tang, Mingcan Xiang, Hui Guan, Tongping Liu

TL;DR

This study is the first to examine memory usage in the RLHF context, exploring various memory management strategies and unveiling the reasons behind excessive memory consumption.

Abstract

Fine-tuning with Reinforcement Learning with Human Feedback (RLHF) is essential for aligning large language models (LLMs). However, RLHF often encounters significant memory challenges. This study is the first to examine memory usage in the RLHF context, exploring various memory management strategies and unveiling the reasons behind excessive memory consumption. Additionally, we introduce a simple yet effective approach that substantially reduces the memory required for RLHF fine-tuning.

Understanding and Alleviating Memory Consumption in RLHF for LLMs

TL;DR

This study is the first to examine memory usage in the RLHF context, exploring various memory management strategies and unveiling the reasons behind excessive memory consumption.

Abstract

Fine-tuning with Reinforcement Learning with Human Feedback (RLHF) is essential for aligning large language models (LLMs). However, RLHF often encounters significant memory challenges. This study is the first to examine memory usage in the RLHF context, exploring various memory management strategies and unveiling the reasons behind excessive memory consumption. Additionally, we introduce a simple yet effective approach that substantially reduces the memory required for RLHF fine-tuning.

Paper Structure

This paper contains 13 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Memory usage (GB) of DeepSpeed-Chat running OPT with multiple memory management strategies enabled. The red cross marks the peak of reserved memory, while the yellow cross and dotted yellow line mark the theoretical peak of reserved memory after subtracting the size of memory fragmentation.