Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training

Yihang Yao; Zhepeng Cen; Miao Li; William Han; Yuyou Zhang; Emerson Liu; Zuxin Liu; Chuang Gan; Ding Zhao

Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training

Yihang Yao, Zhepeng Cen, Miao Li, William Han, Yuyou Zhang, Emerson Liu, Zuxin Liu, Chuang Gan, Ding Zhao

TL;DR

This work tackles the brittleness of LLM reasoning to surface-form variations by introducing symmetry-aware data augmentation, MEND, which augments post-training data with permutation and redundancy transformations to enforce invariant knowledge extraction. By formalizing reasoning on a DAG and defining reasoning consistency as stability across semantically equivalent queries, the method shows improved data efficiency and stronger OOD generalization across logical and arithmetic tasks. Empirical results demonstrate that MEND outperforms reasoning-chain augmentation baselines and inference-time paraphrasing baselines, while a probing tool confirms enhanced in-context knowledge extraction. The findings suggest that structured dataset curation focused on query symmetry can meaningfully boost LLM robustness in reasoning tasks, with implications for more reliable deployment in diverse prompt settings.

Abstract

Large Language Models (LLMs) have demonstrated strong reasoning capabilities across various tasks. However, even minor variations in query phrasing, despite preserving the underlying semantic meaning, can significantly affect their performance. To address this, we focus on enhancing LLMs' awareness of symmetry in query variations and propose syMmetry-ENhanceD (MEND) Data Augmentation, a data-centric approach that improves the model's ability to extract useful information from context. Unlike existing methods that emphasize reasoning chain augmentation, our approach improves model robustness at the knowledge extraction stage through query augmentations, enabling more data-efficient training and stronger generalization to Out-of-Distribution (OOD) settings. Extensive experiments on both logical and arithmetic reasoning tasks show that MEND enhances reasoning performance across diverse query variations, providing new insight into improving LLM robustness through structured dataset curation.

Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training

TL;DR

Abstract

Your Language Model May Think Too Rigidly: Achieving Reasoning Consistency with Symmetry-Enhanced Training

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)