Trinity-RFT: A General-Purpose and Unified Framework for Reinforcement Fine-Tuning of Large Language Models
Xuchen Pan, Yanxi Chen, Yushuo Chen, Yuchang Sun, Daoyuan Chen, Wenhao Zhang, Yuexiang Xie, Yilun Huang, Yilei Zhang, Dawei Gao, Weijie Shi, Yaliang Li, Bolin Ding, Jingren Zhou
TL;DR
Trinity-RFT presents a general-purpose, unified framework for reinforcement fine-tuning of large language models, unifying diverse RL modes through a decoupled RFT-core and robust agent–environment interaction. Its data pipelines, experience buffers, and human-in-the-loop capabilities enable flexible curriculum learning, online reward shaping, and scalable experiments. The work demonstrates practical implementations, including MIX-style algorithm integration, and showcases performance profiling and real learning with GRPO across tasks like GSM8k and ALFWorld. Together, these contributions offer a scalable, low-friction platform to explore and deploy advanced reinforcement learning paradigms for LLMs, bridging research and real-world applications.
Abstract
Trinity-RFT is a general-purpose, unified and easy-to-use framework designed for reinforcement fine-tuning (RFT) of large language models. It is built with a modular and decoupled design, consisting of (1) an RFT-core that unifies and generalizes synchronous/asynchronous, on-policy/off-policy, and online/offline modes of RFT; (2) seamless integration for agent-environment interaction with high efficiency and robustness; and (3) systematic data pipelines optimized for RFT. Trinity-RFT can be easily adapted for diverse application scenarios, and serves as a unified platform for development and research of advanced reinforcement learning paradigms at both macroscopic and microscopic levels. This technical report outlines the vision, features, design and implementations of Trinity-RFT, accompanied by extensive examples, applications and experiments that demonstrate its functionalities and user-friendliness.
