Table of Contents
Fetching ...

STEPER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models

Kyumin Lee, Minjin Jeon, Sanghwan Jang, Hwanjo Yu

TL;DR

StepER employs step-wise supervision to align with evolving information and reasoning demands across stages and incorporates difficulty-aware training to progressively optimize learning by prioritizing suitable steps in multi-step retrieval-augmented frameworks.

Abstract

Answering complex real-world questions requires step-by-step retrieval and integration of relevant information to generate well-grounded responses. However, existing knowledge distillation methods overlook the need for different reasoning abilities at different steps, hindering transfer in multi-step retrieval-augmented frameworks. To address this, we propose Stepwise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models (StepER). StepER employs step-wise supervision to align with evolving information and reasoning demands across stages. Additionally, it incorporates difficulty-aware training to progressively optimize learning by prioritizing suitable steps. Our method is adaptable to various multi-step retrieval-augmented language models, including those that use retrieval queries for reasoning paths or decomposed questions. Extensive experiments show that StepER outperforms prior methods on multi-hop QA benchmarks, with an 8B model achieving performance comparable to a 70B teacher model.

STEPER: Step-wise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models

TL;DR

StepER employs step-wise supervision to align with evolving information and reasoning demands across stages and incorporates difficulty-aware training to progressively optimize learning by prioritizing suitable steps in multi-step retrieval-augmented frameworks.

Abstract

Answering complex real-world questions requires step-by-step retrieval and integration of relevant information to generate well-grounded responses. However, existing knowledge distillation methods overlook the need for different reasoning abilities at different steps, hindering transfer in multi-step retrieval-augmented frameworks. To address this, we propose Stepwise Knowledge Distillation for Enhancing Reasoning Ability in Multi-Step Retrieval-Augmented Language Models (StepER). StepER employs step-wise supervision to align with evolving information and reasoning demands across stages. Additionally, it incorporates difficulty-aware training to progressively optimize learning by prioritizing suitable steps. Our method is adaptable to various multi-step retrieval-augmented language models, including those that use retrieval queries for reasoning paths or decomposed questions. Extensive experiments show that StepER outperforms prior methods on multi-hop QA benchmarks, with an 8B model achieving performance comparable to a 70B teacher model.

Paper Structure

This paper contains 48 sections, 5 equations, 7 figures, 18 tables.

Figures (7)

  • Figure 1: Comparison of Vanilla-KD and StepER. (a) illustrates the conceptual differences in training data. Unlike Vanilla-KD which only uses final-step data, StepER leverages data from all reasoning stages—first-step (initial reasoning based on the first retrieved passages), mid-step (intermediate reasoning with accumulated information), and final-step (complete reasoning with all retrieved passages). StepER learns reasoning abilities more effectively by leveraging all steps of reasoning data during training. (b) presents answer examples from both models. Vanilla-KD often fails in early reasoning stages and generates incorrect answers, whereas StepER performs coherent reasoning throughout and reaches the correct answer.
  • Figure 2: Overview of the StepER framework. We use a teacher LM to construct the dataset via multi-step retrieval, and train the student model with a difficulty-aware strategy that prioritizes reasoning steps more suitable for learning.
  • Figure 3: GPT evaluation results on HotpotQA across three reasoning stages under different step data configurations. StepER, which utilizes all available step data, achieves the highest performance across all evaluation criteria, demonstrating the effectiveness of step-wise training for multi-step retrieval.
  • Figure 4: Model scalability of StepER on HotpotQA using Qwen2.5-Instruct. We compare models of varying sizes and demonstrate that StepER scales effectively with consistently strong multi-step reasoning performance.
  • Figure 5: Out-of-domain adaptation results for StepER versus Vanilla-KD across four domain transfer scenarios: HQ$\rightarrow$2W, HQ$\rightarrow$MQ, MQ$\rightarrow$2W, and MQ$\rightarrow$HQ. StepER consistently outperforms Vanilla-KD, demonstrating stronger cross-domain generalization.
  • ...and 2 more figures