Table of Contents
Fetching ...

Information-Preserving Reformulation of Reasoning Traces for Antidistillation

Jiayu Ding, Lei Cui, Li Dong, Nanning Zheng, Furu Wei

TL;DR

This work tackles the risk of unauthorized distillation from reasoning traces by introducing PART, an information-preserving antidistillation reformulation. PART combines token-level removal of self-talk with a structural shift to a conclusion-before-process order, and is implemented via a compact reformulation model trained with GPT-4o data. Empirical results show PART consistently degrades distillation across multiple benchmarks and student sizes, while preserving lexical and semantic information and remaining interpretable to humans. The approach also enables detectability and remains robust to data scale, offering a practical balance between interpretability and IP protection for reasoning traces.

Abstract

Recent advances in Large Language Models (LLMs) show that extending the length of reasoning chains significantly improves performance on complex tasks. While revealing these reasoning traces helps users better follow, verify, and learn from the model's problem-solving process, it also makes them highly vulnerable to unauthorized distillation. To mitigate this risk, proprietary model providers often adopt aggressive protection strategies, such as replacing detailed reasoning with brief summaries, which deprive users of valuable intermediate information. To address this trade-off, we propose PART, an information-preserving antidistillation reformulation of reasoning traces. Motivated by the difference between how humans understand reasoning traces and how LLMs exploit them for supervised fine-tuning, we design a simple but effective two-step reformulation: removing self-talk behaviors and reordering sub-conclusions. A small auxiliary model is trained to perform this reformulation, incurring minimal computational overhead. Extensive experiments demonstrate that PART consistently disrupts distillation across student models of different sizes and types on various reasoning benchmarks. For instance, when training on reformulated traces, even the performance of a large 32B student model decreases from 54.17 to 46.88 on AIME 2024, corresponding to a 13.5% degradation.

Information-Preserving Reformulation of Reasoning Traces for Antidistillation

TL;DR

This work tackles the risk of unauthorized distillation from reasoning traces by introducing PART, an information-preserving antidistillation reformulation. PART combines token-level removal of self-talk with a structural shift to a conclusion-before-process order, and is implemented via a compact reformulation model trained with GPT-4o data. Empirical results show PART consistently degrades distillation across multiple benchmarks and student sizes, while preserving lexical and semantic information and remaining interpretable to humans. The approach also enables detectability and remains robust to data scale, offering a practical balance between interpretability and IP protection for reasoning traces.

Abstract

Recent advances in Large Language Models (LLMs) show that extending the length of reasoning chains significantly improves performance on complex tasks. While revealing these reasoning traces helps users better follow, verify, and learn from the model's problem-solving process, it also makes them highly vulnerable to unauthorized distillation. To mitigate this risk, proprietary model providers often adopt aggressive protection strategies, such as replacing detailed reasoning with brief summaries, which deprive users of valuable intermediate information. To address this trade-off, we propose PART, an information-preserving antidistillation reformulation of reasoning traces. Motivated by the difference between how humans understand reasoning traces and how LLMs exploit them for supervised fine-tuning, we design a simple but effective two-step reformulation: removing self-talk behaviors and reordering sub-conclusions. A small auxiliary model is trained to perform this reformulation, incurring minimal computational overhead. Extensive experiments demonstrate that PART consistently disrupts distillation across student models of different sizes and types on various reasoning benchmarks. For instance, when training on reformulated traces, even the performance of a large 32B student model decreases from 54.17 to 46.88 on AIME 2024, corresponding to a 13.5% degradation.

Paper Structure

This paper contains 27 sections, 5 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Overview of PART. Directly exposing original reasoning traces leaves them vulnerable to unauthorized distillation, whereas providing only summaries deprives users of the information contained in the reasoning process. PART introduces an information-preserving antidistillation approach through reformulation at both the token level and the structural level.
  • Figure 2: Predicted probabilities of the student model on teacher-generated reasoning traces. (a) Visualization of token-level predicted probabilities, where deeper red indicates lower probabilities. Teacher-generated traces exhibit frequent self-talk behaviors, which conveys little reasoning content yet receives low probabilities. (b) Tracking the probabilities of self-talk-behavior tokens across training stages reveals that they remain persistently lower than the average probabilities, suggesting that these semantically uninformative expressions exert disproportionate influence on gradient updates.
  • Figure 3: (a) Match ratios under different lexical similarity score thresholds. PART achieves significant higher match ratios than the summary-based method at both step and sentence levels. (b) Human judgment about informativeness. Compared to original reasoning traces, PART is judged similarly informative; compared to the summary-based reformulation, PART is clearly preferred in terms of the informativeness.
  • Figure 4: Performance comparison (a) across different student model sizes and (b) across different data scales of distilled models trained on original versus reformulated traces. Across both factors, PART leads to consistent performance degradation of the distilled models, demonstrating its effectiveness as an antidistillation approach.