Learning to Disentangle Latent Reasoning Rules with Language VAEs: A Systematic Study
Yingji Zhang, Marco Valentino, Danilo S. Carvalho, André Freitas
TL;DR
This work tackles the memorisation-versus-rule inference gap in natural language inference by explicitly encoding reasoning rules in a language VAE's latent space. It presents a NTK-inspired framework that treats rules as distinct latent subspaces, demonstrating that explicit supervision yields disentangled rule representations and rule-specific clustering. The authors implement an end-to-end Transformer-based VAE with three latent-injection strategies, finding that injecting latent information into the Query yields the best rule separation, and that FFN components better preserve rule separation than attention. The approach improves interpretability and controllability of latent reasoning, with practical implications for safer and more auditable NLI systems, and points to diffusion-based extensions for future decoding control.
Abstract
Incorporating explicit reasoning rules within the latent space of language models (LMs) offers a promising pathway to enhance generalisation, interpretability, and controllability. While current Transformer-based language models have shown strong performance on Natural Language Inference (NLI) tasks, they often rely on memorisation rather than rule-based inference. This work investigates how reasoning rules can be explicitly embedded and memorised within the LMs through Language Variational Autoencoders (VAEs). We propose a complete pipeline for learning reasoning rules within Transformer-based language VAEs. This pipeline encompasses three rule-based reasoning tasks, a supporting theoretical framework, and a practical end-to-end architecture. The experiment illustrates the following findings: Disentangled reasoning: Under explicit signal supervision, reasoning rules - viewed as functional mappings - can be disentangled within the encoder's parametric space. This separation results in distinct clustering of rules in the output feature space. Prior knowledge injection: injecting reasoning information into the Query enables the model to more effectively retrieve the stored value Value from memory based on Key. This approach offers a simple method for integrating prior knowledge into decoder-only language models. Performance bottleneck: In mathematical reasoning tasks using Qwen2.5(0.5B), increasing sample count doesn't improve performance beyond a point. Moreover, ffn layers are better than attention layers at preserving the separation of reasoning rules in the model's parameters.
