Can Large Language Models Learn Independent Causal Mechanisms?
Gaël Gendron, Bao Trung Nguyen, Alex Yuxuan Peng, Michael Witbrock, Gillian Dobbie
TL;DR
The paper investigates whether Large Language Models can learn Independent Causal Mechanisms by introducing Independent Causal Language Models (ICLM) that route inputs to domain-specific modules while sharing a domain-invariant module. It combines unsupervised routing via vector quantisation with an information-theoretic Mutual Information minimisation to encourage modular independence and abstraction. The approach is theoretically motivated and empirically evaluated on abstract and causal reasoning tasks (ACRE and RAVEN), showing improved out-of-distribution generalisation and partial independence between modules, with domain-invariant knowledge contributing broadly while domain-specific modules specialise. The findings suggest that principled modularity can enhance robustness to distribution shifts in LLMs, though complete independence is not achieved and the approach incurs substantial compute, motivating future work on richer causal graphs and more scalable routing strategies.
Abstract
Despite impressive performance on language modelling and complex reasoning tasks, Large Language Models (LLMs) fall short on the same tasks in uncommon settings or with distribution shifts, exhibiting a lack of generalisation ability. By contrast, systems such as causal models, that learn abstract variables and causal relationships, can demonstrate increased robustness against changes in the distribution. One reason for this success is the existence and use of Independent Causal Mechanisms (ICMs) representing high-level concepts that only sparsely interact. In this work, we apply two concepts from causality to learn ICMs within LLMs. We develop a new LLM architecture composed of multiple sparsely interacting language modelling modules. We show that such causal constraints can improve out-of-distribution performance on abstract and causal reasoning tasks. We also investigate the level of independence and domain specialisation and show that LLMs rely on pre-trained partially domain-invariant mechanisms resilient to fine-tuning.
