Table of Contents
Fetching ...

Flaming-hot Initiation with Regular Execution Sampling for Large Language Models

Weizhe Chen, Zhicheng Zhang, Guanlin Liu, Renjie Zheng, Wenlei Shi, Chen Dun, Zheng Wu, Xing Jin, Lin Yan

TL;DR

Flaming-hot Initiation with Regular Execution (FIRE) addresses data efficiency in large language model training and inference by applying an extreme temperature to the initial token ($p \gg 1$) and then proceeding with regular decoding, leveraging the attention-sink effect of early tokens to boost downstream reasoning diversity. Empirical results across open-source models and tasks (GSM8K, MATH, MBPP(+)) show that FIRE consistently raises pass rates when multiple samples are considered and improves reinforcement learning with human feedback (RLHF) training, without sacrificing single-shot accuracy. The method is differentiable, compatible with existing frameworks, and complementary to prompt-based diversification, enabling broader improvements in both inference-time quality and alignment. The findings suggest FIRE as a practical, model-agnostic technique to enhance sampling diversity, with implications for more data-efficient and robust LLM alignment and reasoning systems.

Abstract

Since the release of ChatGPT, large language models (LLMs) have demonstrated remarkable capabilities across various domains. A key challenge in developing these general capabilities is efficiently sourcing diverse, high-quality data. This becomes especially critical in reasoning-related tasks with sandbox checkers, such as math or code, where the goal is to generate correct solutions to specific problems with higher probability. In this work, we introduce Flaming-hot Initiation with Regular Execution (FIRE) sampling, a simple yet highly effective method to efficiently find good responses. Our empirical findings show that FIRE sampling enhances inference-time generation quality and also benefits training in the alignment stage. Furthermore, we explore how FIRE sampling improves performance by promoting diversity and analyze the impact of employing FIRE at different positions within a response.

Flaming-hot Initiation with Regular Execution Sampling for Large Language Models

TL;DR

Flaming-hot Initiation with Regular Execution (FIRE) addresses data efficiency in large language model training and inference by applying an extreme temperature to the initial token () and then proceeding with regular decoding, leveraging the attention-sink effect of early tokens to boost downstream reasoning diversity. Empirical results across open-source models and tasks (GSM8K, MATH, MBPP(+)) show that FIRE consistently raises pass rates when multiple samples are considered and improves reinforcement learning with human feedback (RLHF) training, without sacrificing single-shot accuracy. The method is differentiable, compatible with existing frameworks, and complementary to prompt-based diversification, enabling broader improvements in both inference-time quality and alignment. The findings suggest FIRE as a practical, model-agnostic technique to enhance sampling diversity, with implications for more data-efficient and robust LLM alignment and reasoning systems.

Abstract

Since the release of ChatGPT, large language models (LLMs) have demonstrated remarkable capabilities across various domains. A key challenge in developing these general capabilities is efficiently sourcing diverse, high-quality data. This becomes especially critical in reasoning-related tasks with sandbox checkers, such as math or code, where the goal is to generate correct solutions to specific problems with higher probability. In this work, we introduce Flaming-hot Initiation with Regular Execution (FIRE) sampling, a simple yet highly effective method to efficiently find good responses. Our empirical findings show that FIRE sampling enhances inference-time generation quality and also benefits training in the alignment stage. Furthermore, we explore how FIRE sampling improves performance by promoting diversity and analyze the impact of employing FIRE at different positions within a response.

Paper Structure

This paper contains 14 sections, 1 figure, 9 tables.

Figures (1)

  • Figure 1: Curves for pass rate and number of effective answers with different numbers of samples on GSM8K.