Table of Contents
Fetching ...

CARMA: Enhanced Compositionality in LLMs via Advanced Regularisation and Mutual Information Alignment

Nura Aljaafari, Danilo S. Carvalho, André Freitas

TL;DR

CARMA addresses compositional generalisation (CG) limitations in LLMs by introducing two non-architectural regularisers: mutual information regularisation across layers ($\mathcal{L}_{MI}$) and layer-wise stability regularisation ($\mathcal{L}_{Stability}$). The losses combine into $\mathcal{L}_{CARMA}=\gamma\mathcal{L}_{MI}+\eta\mathcal{L}_{Stability}$ and are integrated with the task objective as $\mathcal{L}_{total}=(1-\lambda)\mathcal{L}_{task}+\lambda\mathcal{L}_{CARMA}$, enabling improved structured representations without architectural changes. CARMA improves semantic consistency and stability on Inverse Dictionary Modelling and Sentiment Classification, though effects vary with model architecture and tokenisation. It introduces training-time overhead due to auxiliary losses but preserves inference costs and downstream task performance, making it a scalable tool for enhancing CG in real-world settings. Overall, CARMA demonstrates that reinforcing learned structures through regularisation can substantially improve compositional reasoning in LLMs, with practical implications for robust language understanding.

Abstract

Large language models (LLMs) struggle with compositional generalisation, limiting their ability to systematically combine learned components to interpret novel inputs. While architectural modifications, fine-tuning, and data augmentation improve compositionality, they often have limited adaptability, face scalability constraints, or yield diminishing returns on real data. To address this, we propose CARMA, an intervention that enhances the stability and robustness of compositional reasoning in LLMs while preserving fine-tuned performance. CARMA employs mutual information regularisation and layer-wise stability constraints to mitigate feature fragmentation, ensuring structured representations persist across and within layers. We evaluate CARMA on inverse dictionary modelling and sentiment classification, measuring its impact on semantic consistency, performance stability, and robustness to lexical perturbations. Results show that CARMA reduces the variability introduced by fine-tuning, stabilises token representations, and improves compositional reasoning. While its effectiveness varies across architectures, CARMA's key strength lies in reinforcing learned structures rather than introducing new capabilities, making it a scalable auxiliary method. These findings suggest that integrating CARMA with fine-tuning can improve compositional generalisation while maintaining task-specific performance in LLMs.

CARMA: Enhanced Compositionality in LLMs via Advanced Regularisation and Mutual Information Alignment

TL;DR

CARMA addresses compositional generalisation (CG) limitations in LLMs by introducing two non-architectural regularisers: mutual information regularisation across layers () and layer-wise stability regularisation (). The losses combine into and are integrated with the task objective as , enabling improved structured representations without architectural changes. CARMA improves semantic consistency and stability on Inverse Dictionary Modelling and Sentiment Classification, though effects vary with model architecture and tokenisation. It introduces training-time overhead due to auxiliary losses but preserves inference costs and downstream task performance, making it a scalable tool for enhancing CG in real-world settings. Overall, CARMA demonstrates that reinforcing learned structures through regularisation can substantially improve compositional reasoning in LLMs, with practical implications for robust language understanding.

Abstract

Large language models (LLMs) struggle with compositional generalisation, limiting their ability to systematically combine learned components to interpret novel inputs. While architectural modifications, fine-tuning, and data augmentation improve compositionality, they often have limited adaptability, face scalability constraints, or yield diminishing returns on real data. To address this, we propose CARMA, an intervention that enhances the stability and robustness of compositional reasoning in LLMs while preserving fine-tuned performance. CARMA employs mutual information regularisation and layer-wise stability constraints to mitigate feature fragmentation, ensuring structured representations persist across and within layers. We evaluate CARMA on inverse dictionary modelling and sentiment classification, measuring its impact on semantic consistency, performance stability, and robustness to lexical perturbations. Results show that CARMA reduces the variability introduced by fine-tuning, stabilises token representations, and improves compositional reasoning. While its effectiveness varies across architectures, CARMA's key strength lies in reinforcing learned structures rather than introducing new capabilities, making it a scalable auxiliary method. These findings suggest that integrating CARMA with fine-tuning can improve compositional generalisation while maintaining task-specific performance in LLMs.

Paper Structure

This paper contains 43 sections, 16 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: This diagram depicts the computation of the loss and illustrates the integration of the Mutual Information (MI) loss ($\mathcal{L}_{\text{MI}}$) and the Stability Loss ($\mathcal{L}_{\text{stability}}$) into the final optimisation process. Tokens $Tok_1$ and $Tok_2$ form the positive set ($H_{\text{pos}}$), while $Tok_3, Tok_4, Tok_5$ form the negative set ($H_{\text{neg}}$). The $\mathcal{L}_{\text{MI}}$ loss is computed vertically across layers ($l$ to $k$), maximising the similarity of tokens in $H_{\text{pos}}$ while contrasting them with tokens in $H_{\text{neg}}$. The $\mathcal{L}_{\text{stability}}$ loss is computed horizontally between consecutive layers, ensuring consistency in hidden state representations. Both auxiliary losses are combined with the task loss ($\mathcal{L}_{\text{task}}$) to form the total loss ($\mathcal{L}_{\text{total}}$). This integration improves token representations and enhances the model's overall optimisation.
  • Figure 2: Layer-wise performance comparison under CAP intervention, with performance averaged over three protocols (Mean CAP, Max CAP, Sum CAP) for Original, Fine-Tuned (FT), and CARMA (FT + CARMA) models. Layer numbers are normalised to their relative positions within each model to enable cross-architecture comparison. The IDM task (left) highlights CARMA's improvements in systematicity and stability, particularly in the early and middle layers. The SC task (right) demonstrates CARMA's ability to enhance robustness, though convergence with FT occurs in deeper layers.
  • Figure 3: Task performance in IDM across GPT2 (S, L), Gemma-2B, Llama (1B, 3B), and Qwen (0.5B, 3B).
  • Figure 4: Task performance in SC across GPT2 (S, L), Gemma-2B, Llama (1B, 3B) and Qwen (0.5B, 3B).
  • Figure 5: Illustration of compositional generalisation in Inverse Dictionary Modelling (IDM) and Sentiment Classification (SC). The figure highlights key compositional properties: systematicity ensures coherent meaning construction, substitutivity maintains meaning under lexical variations, robustness preserves intended outputs under perturbations, and over-generalisation leads to overly broad or semantically weak predictions (e.g., neuron misclassified as cell or positive reduced to neutral).
  • ...and 3 more figures