InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models

Wenjun Wang; Shuo Cai; Congkai Xie; Mingfa Feng; Yiming Zhang; Zhen Li; Kejing Yang; Ming Li; Jiannong Cao; Hongxia Yang

InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models

Wenjun Wang, Shuo Cai, Congkai Xie, Mingfa Feng, Yiming Zhang, Zhen Li, Kejing Yang, Ming Li, Jiannong Cao, Hongxia Yang

TL;DR

This work targets the high computational cost of training large language models by proposing an end-to-end FP8 training recipe that combines continual pretraining and supervised fine-tuning. The core method is a hybrid-granularity quantization strategy that uses per-block quantization for weights and per-token quantization for activations, while keeping critical components in FP32 to preserve precision. Empirically, FP8 training demonstrates stability and near-lossless fidelity to BF16 across 160B-token pretraining and subsequent SFT, with InfiR2-1.5B-FP8 and InfiR2-7B-FP8 achieving competitive or superior reasoning benchmark performance (e.g., AIME24, GPQA) and substantial efficiency gains (up to 22% faster training, 14% less memory, 19% higher throughput). The results establish FP8 as a practical alternative to BF16 for scalable LLM training, and the authors publish their code and intermediate artifacts to democratize access to FP8 training.

Abstract

The immense computational cost of training Large Language Models (LLMs) presents a major barrier to innovation. While FP8 training offers a promising solution with significant theoretical efficiency gains, its widespread adoption has been hindered by the lack of a comprehensive, open-source training recipe. To bridge this gap, we introduce an end-to-end FP8 training recipe that seamlessly integrates continual pre-training and supervised fine-tuning. Our methodology employs a fine-grained, hybrid-granularity quantization strategy to maintain numerical fidelity while maximizing computational efficiency. Through extensive experiments, including the continue pre-training of models on a 160B-token corpus, we demonstrate that our recipe is not only remarkably stable but also essentially lossless, achieving performance on par with the BF16 baseline across a suite of reasoning benchmarks. Crucially, this is achieved with substantial efficiency improvements, including up to a 22% reduction in training time, a 14% decrease in peak memory usage, and a 19% increase in throughput. Our results establish FP8 as a practical and robust alternative to BF16, and we will release the accompanying code to further democratize large-scale model training.

InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models

TL;DR

Abstract

InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)