Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models

Hirohane Takagi; Gouki Minegishi; Shota Kizawa; Issey Sukeda; Hitomi Yanaka

Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models

Hirohane Takagi, Gouki Minegishi, Shota Kizawa, Issey Sukeda, Hitomi Yanaka

TL;DR

Using Partial Least Squares (PLS) regression on hidden states $X \in \mathbb{R}^{n\times h}$ to derive a low-dimensional subspace $Z = XW \in \mathbb{R}^{n\times k}$ and predictions $\hat{Y}$, this paper probes how Large Language Models encode multiple numerical attributes and responds to irrelevant numerical context via Spearman partial correlations. Across four transformer LLMs, the authors show LLMs preserve real-world numerical correlations but tend to amplify them, with inter-attribute subspaces overlapping and exhibiting asymmetric interference. They also demonstrate that irrelevant numerical prompts drift internal representations and that perturbations propagate differently by model size, with smaller models being more susceptible to prompt-induced bias. The work highlights a vulnerability in numerically sensitive decision making and provides a representation-aware framework for designing fairer prompts and mitigation strategies in numerically entangled contexts, guiding future efforts in robustness and interpretability of LLMs in high-stakes numerical tasks.

Abstract

Although behavioral studies have documented numerical reasoning errors in large language models (LLMs), the underlying representational mechanisms remain unclear. We hypothesize that numerical attributes occupy shared latent subspaces and investigate two questions:(1) How do LLMs internally integrate multiple numerical attributes of a single entity? (2)How does irrelevant numerical context perturb these representations and their downstream outputs? To address these questions, we combine linear probing with partial correlation analysis and prompt-based vulnerability tests across models of varying sizes. Our results show that LLMs encode real-world numerical correlations but tend to systematically amplify them. Moreover, irrelevant context induces consistent shifts in magnitude representations, with downstream effects that vary by model size. These findings reveal a vulnerability in LLM decision-making and lay the groundwork for fairer, representation-aware control under multi-attribute entanglement.

Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models

TL;DR

Using Partial Least Squares (PLS) regression on hidden states

to derive a low-dimensional subspace

and predictions

, this paper probes how Large Language Models encode multiple numerical attributes and responds to irrelevant numerical context via Spearman partial correlations. Across four transformer LLMs, the authors show LLMs preserve real-world numerical correlations but tend to amplify them, with inter-attribute subspaces overlapping and exhibiting asymmetric interference. They also demonstrate that irrelevant numerical prompts drift internal representations and that perturbations propagate differently by model size, with smaller models being more susceptible to prompt-induced bias. The work highlights a vulnerability in numerically sensitive decision making and provides a representation-aware framework for designing fairer prompts and mitigation strategies in numerically entangled contexts, guiding future efforts in robustness and interpretability of LLMs in high-stakes numerical tasks.

Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models

TL;DR

Abstract

Interpreting Multi-Attribute Confounding through Numerical Attributes in Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)