ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers

Chen Zheng; Ke Sun; Da Tang; Yukun Ma; Yuyu Zhang; Chenguang Xi; Xun Zhou

ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers

Chen Zheng, Ke Sun, Da Tang, Yukun Ma, Yuyu Zhang, Chenguang Xi, Xun Zhou

TL;DR

ICE-GRT addresses the gap where large LLMs struggle with domain-specific depth and small models lack analysis capability. It combines ICE-Instruct as a strong SFT backbone with ICE-Reward and a PPO-based RLHF loop (Actor, Reference, Reward, Critic) to achieve robust in-domain reasoning while preserving general capabilities. The paper demonstrates state-of-the-art performance for a 13B model on 12 public benchmarks and provides a comprehensive analysis of training factors—data quality, reward scaling, KL-control, and advantage normalization—that drive success, including a domain case study in ad moderation. Its open-source release on HuggingFace aims to democratize access and spur further advances in domain-aware LLMs.

Abstract

The emergence of Large Language Models (LLMs) such as ChatGPT and LLaMA encounter limitations in domain-specific tasks, with these models often lacking depth and accuracy in specialized areas, and exhibiting a decrease in general capabilities when fine-tuned, particularly analysis ability in small sized models. To address these gaps, we introduce ICE-GRT, utilizing Reinforcement Learning from Human Feedback (RLHF) grounded in Proximal Policy Optimization (PPO), demonstrating remarkable ability in in-domain scenarios without compromising general task performance. Our exploration of ICE-GRT highlights its understanding and reasoning ability to not only generate robust answers but also to provide detailed analyses of the reasons behind the answer. This capability marks a significant progression beyond the scope of Supervised Fine-Tuning models. The success of ICE-GRT is dependent on several crucial factors, including Appropriate Data, Reward Size Scaling, KL-Control, Advantage Normalization, etc. The ICE-GRT model exhibits state-of-the-art performance in domain-specific tasks and across 12 general Language tasks against equivalent size and even larger size LLMs, highlighting the effectiveness of our approach. We provide a comprehensive analysis of the ICE-GRT, underscoring the significant advancements it brings to the field of LLM.

ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers

TL;DR

Abstract

ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers

Authors

TL;DR

Abstract

Table of Contents

Figures (4)