Table of Contents
Fetching ...

Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression

Lingjie Zeng, Xiaofan Chen, Yanbo Wang, Xiuying Chen

Abstract

Long chain-of-thought (Long-CoT) reasoning models have motivated a growing body of work on compressing reasoning traces to reduce inference cost, yet existing evaluations focus almost exclusively on task accuracy and token savings. Trustworthiness properties, whether acquired or reinforced through post-training, are encoded in the same parameter space that compression modifies. This means preserving accuracy does not, a priori, guarantee preserving trustworthiness. We conduct the first systematic empirical study of how CoT compression affects model trustworthiness, evaluating multiple models of different scales along three dimensions: safety, hallucination resistance, and multilingual robustness. Under controlled comparisons, we find that CoT compression frequently introduces trustworthiness regressions and that different methods exhibit markedly different degradation profiles across dimensions. To enable fair comparison across bases, we propose a normalized efficiency score for each dimension that reveals how naïve scalar metrics can obscure trustworthiness trade-offs. As an existence proof, we further introduce an alignment-aware DPO variant that reduces CoT length by 19.3\% on reasoning benchmarks with substantially smaller trustworthiness loss. Our findings suggest that CoT compression should be optimized not only for efficiency but also for trustworthiness, treating both as equally important design constraints.

Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression

Abstract

Long chain-of-thought (Long-CoT) reasoning models have motivated a growing body of work on compressing reasoning traces to reduce inference cost, yet existing evaluations focus almost exclusively on task accuracy and token savings. Trustworthiness properties, whether acquired or reinforced through post-training, are encoded in the same parameter space that compression modifies. This means preserving accuracy does not, a priori, guarantee preserving trustworthiness. We conduct the first systematic empirical study of how CoT compression affects model trustworthiness, evaluating multiple models of different scales along three dimensions: safety, hallucination resistance, and multilingual robustness. Under controlled comparisons, we find that CoT compression frequently introduces trustworthiness regressions and that different methods exhibit markedly different degradation profiles across dimensions. To enable fair comparison across bases, we propose a normalized efficiency score for each dimension that reveals how naïve scalar metrics can obscure trustworthiness trade-offs. As an existence proof, we further introduce an alignment-aware DPO variant that reduces CoT length by 19.3\% on reasoning benchmarks with substantially smaller trustworthiness loss. Our findings suggest that CoT compression should be optimized not only for efficiency but also for trustworthiness, treating both as equally important design constraints.

Paper Structure

This paper contains 38 sections, 1 equation, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Overview of the evaluation framework. Each compressed model is compared against its matched uncompressed baseline under a uniform protocol across three trustworthiness dimensions.
  • Figure 2: Trustworthiness--length Pareto frontiers for the Qwen3-8B family. Each panel plots one trustworthiness metric against mean CoT token count. The black polyline marks the Pareto frontier.
  • Figure 3: Trustworthiness--length Pareto frontiers across all model groups. Colors denote compression methods, marker size denotes model scale. The black polyline marks the Pareto frontier in each panel.