Table of Contents
Fetching ...

ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers

Chen Zheng, Ke Sun, Da Tang, Yukun Ma, Yuyu Zhang, Chenguang Xi, Xun Zhou

TL;DR

ICE-GRT addresses the gap where large LLMs struggle with domain-specific depth and small models lack analysis capability. It combines ICE-Instruct as a strong SFT backbone with ICE-Reward and a PPO-based RLHF loop (Actor, Reference, Reward, Critic) to achieve robust in-domain reasoning while preserving general capabilities. The paper demonstrates state-of-the-art performance for a 13B model on 12 public benchmarks and provides a comprehensive analysis of training factors—data quality, reward scaling, KL-control, and advantage normalization—that drive success, including a domain case study in ad moderation. Its open-source release on HuggingFace aims to democratize access and spur further advances in domain-aware LLMs.

Abstract

The emergence of Large Language Models (LLMs) such as ChatGPT and LLaMA encounter limitations in domain-specific tasks, with these models often lacking depth and accuracy in specialized areas, and exhibiting a decrease in general capabilities when fine-tuned, particularly analysis ability in small sized models. To address these gaps, we introduce ICE-GRT, utilizing Reinforcement Learning from Human Feedback (RLHF) grounded in Proximal Policy Optimization (PPO), demonstrating remarkable ability in in-domain scenarios without compromising general task performance. Our exploration of ICE-GRT highlights its understanding and reasoning ability to not only generate robust answers but also to provide detailed analyses of the reasons behind the answer. This capability marks a significant progression beyond the scope of Supervised Fine-Tuning models. The success of ICE-GRT is dependent on several crucial factors, including Appropriate Data, Reward Size Scaling, KL-Control, Advantage Normalization, etc. The ICE-GRT model exhibits state-of-the-art performance in domain-specific tasks and across 12 general Language tasks against equivalent size and even larger size LLMs, highlighting the effectiveness of our approach. We provide a comprehensive analysis of the ICE-GRT, underscoring the significant advancements it brings to the field of LLM.

ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers

TL;DR

ICE-GRT addresses the gap where large LLMs struggle with domain-specific depth and small models lack analysis capability. It combines ICE-Instruct as a strong SFT backbone with ICE-Reward and a PPO-based RLHF loop (Actor, Reference, Reward, Critic) to achieve robust in-domain reasoning while preserving general capabilities. The paper demonstrates state-of-the-art performance for a 13B model on 12 public benchmarks and provides a comprehensive analysis of training factors—data quality, reward scaling, KL-control, and advantage normalization—that drive success, including a domain case study in ad moderation. Its open-source release on HuggingFace aims to democratize access and spur further advances in domain-aware LLMs.

Abstract

The emergence of Large Language Models (LLMs) such as ChatGPT and LLaMA encounter limitations in domain-specific tasks, with these models often lacking depth and accuracy in specialized areas, and exhibiting a decrease in general capabilities when fine-tuned, particularly analysis ability in small sized models. To address these gaps, we introduce ICE-GRT, utilizing Reinforcement Learning from Human Feedback (RLHF) grounded in Proximal Policy Optimization (PPO), demonstrating remarkable ability in in-domain scenarios without compromising general task performance. Our exploration of ICE-GRT highlights its understanding and reasoning ability to not only generate robust answers but also to provide detailed analyses of the reasons behind the answer. This capability marks a significant progression beyond the scope of Supervised Fine-Tuning models. The success of ICE-GRT is dependent on several crucial factors, including Appropriate Data, Reward Size Scaling, KL-Control, Advantage Normalization, etc. The ICE-GRT model exhibits state-of-the-art performance in domain-specific tasks and across 12 general Language tasks against equivalent size and even larger size LLMs, highlighting the effectiveness of our approach. We provide a comprehensive analysis of the ICE-GRT, underscoring the significant advancements it brings to the field of LLM.
Paper Structure (30 sections, 5 equations, 4 figures, 13 tables)

This paper contains 30 sections, 5 equations, 4 figures, 13 tables.

Figures (4)

  • Figure 1: ICE-GRT Model Architecture.
  • Figure 2: The influence of different training data.
  • Figure 3: Score Comparsions between different LLMs.
  • Figure 4: Comparative Analysis of ICE-GRT and ICE-GRT Advantage Normalization on the Natural Question (NQ) Benchmark. The x-axis represents different epochs, while the y-axis shows the NQ scores.