Expression Syntax Information Bottleneck for Math Word Problems
Jing Xiong, Chengming Li, Min Yang, Xiping Hu, Bin Hu
TL;DR
Math Word Problems are vulnerable to spurious correlations between surface cues and solution expressions. ESIB applies the variational information bottleneck to learn a concise latent $z$ that preserves predictive information about the solution $y$ while minimizing $I(x; z)$, and it uses mutual learning between two problem representations to enforce consistent expression-syntax information. A self-distillation loss $\mathcal{V}_{SDL}$ further promotes diverse, syntax-consistent solution expressions. On four benchmarks (Math23K, Ape210K, MAWPS, CM17K), ESIB achieves state-of-the-art accuracy and generates more diverse expressions, while robustness analyses show improved resistance to adversarial perturbations. The work provides theoretical links between IB, mutual learning, and generalization/robustness in MWP.
Abstract
Math Word Problems (MWP) aims to automatically solve mathematical questions given in texts. Previous studies tend to design complex models to capture additional information in the original text so as to enable the model to gain more comprehensive features. In this paper, we turn our attention in the opposite direction, and work on how to discard redundant features containing spurious correlations for MWP. To this end, we design an Expression Syntax Information Bottleneck method for MWP (called ESIB) based on variational information bottleneck, which extracts essential features of expression syntax tree while filtering latent-specific redundancy containing syntax-irrelevant features. The key idea of ESIB is to encourage multiple models to predict the same expression syntax tree for different problem representations of the same problem by mutual learning so as to capture consistent information of expression syntax tree and discard latent-specific redundancy. To improve the generalization ability of the model and generate more diverse expressions, we design a self-distillation loss to encourage the model to rely more on the expression syntax information in the latent space. Experimental results on two large-scale benchmarks show that our model not only achieves state-of-the-art results but also generates more diverse solutions. The code is available in https://github.com/menik1126/math_ESIB.
