How does fine-tuning improve sensorimotor representations in large language models?

Minghua Wu; Javier Conde; Pedro Reviriego; Marc Brysbaert

How does fine-tuning improve sensorimotor representations in large language models?

Minghua Wu, Javier Conde, Pedro Reviriego, Marc Brysbaert

TL;DR

It is demonstrated that the internal representations of LLMs can be steered toward more embodied, grounded patterns through fine-tuning, and that while sensorimotor improvements generalize robustly across languages and related sensory-motor dimensions, they are highly sensitive to the learning objective, failing to transfer across two disparate task formats.

Abstract

Large Language Models (LLMs) exhibit a significant "embodiment gap", where their text-based representations fail to align with human sensorimotor experiences. This study systematically investigates whether and how task-specific fine-tuning can bridge this gap. Utilizing Representational Similarity Analysis (RSA) and dimension-specific correlation metrics, we demonstrate that the internal representations of LLMs can be steered toward more embodied, grounded patterns through fine-tuning. Furthermore, the results show that while sensorimotor improvements generalize robustly across languages and related sensory-motor dimensions, they are highly sensitive to the learning objective, failing to transfer across two disparate task formats.

How does fine-tuning improve sensorimotor representations in large language models?

TL;DR

Abstract

Paper Structure (26 sections, 3 figures)

This paper contains 26 sections, 3 figures.

Introduction
Results
Overall Structural Alignment: Representational Similarity Analysis (RSA)
Dimension-Specific Improvements and Variances
Concept-Level Analysis: Successes and Failure Modes
Hierarchical Performance and Cross-Lingual Transfer
Correlational Evidence for Representational Reorganization and Convergence
Case Study: Sensorimotor Profile of an example Concept
In summary,
Discussion
Methods
Inclusion and ethics
Psycholinguistic Norms
English Sensorimotor Norms.
Dutch Sensory Norms and Bilingual Dataset Construction.
...and 11 more sections

Figures (3)

Figure 1: Structural Alignment of Sensorimotor Representations Before and After Fine-tuning. Bar chart showing the overall Spearman’s $\rho$ correlation between model-derived and human-derived representational similarity matrices (RDMs) for all sensorimotor dimensions on the English evaluation set (a) and for sensory dimensions on the Dutch evaluation set (b). (c) Representational Distance Matrices (RDMs) for human ratings and model representations, based on a representative subset of 40 English words (the full dataset contains 1572 concepts). (d) Density distributions of Spearman’s $\rho$ for motor dimensions on the English set, derived from 200 bootstrap resampling iterations.
Figure 2: Dimension-wise representational alignment and the distribution of human ratings. Bar charts display the Spearman’s rank correlation coefficients ($\rho$) between model embeddings and human sensorimotor norms across different dimensions for the (a) English and (b) Dutch evaluation sets. Red brackets labeled ‘ns’ denote non-significant differences ($p \geq 0.5$) from pairwise comparison tests; all other pairwise differences are significant (p < 0.05). (c) illustrates the distribution of human ratings for 2358 concepts across each sensorimotor dimension in the English training set.
Figure 3: Comparative analysis of word-wise model representations against human ratings. (a, b) Distribution of model-human similarity (calculated via Euclidean distance) for different models on the (a) English and (b) Dutch evaluation sets. (c) Heatmap of inter-model Spearman's $\rho$ correlation, derived from the similarity scores in (a) and (b). (d) Scatter plots contrasting specific model pairs (from left to right, top to down): Base vs. En_FT, Base vs. Nl_FT, Base vs. QA_FT, and En_FT vs. Nl_FT. (e) Sensorimotor rating profiles for the exemplar word "SHOUTER". The radar chart compares the predicted rating scores (on a scale of 0–5) for each of the 11 sensorimotor dimensions across the four models (colored lines) with the actual human ratings (gray line).

How does fine-tuning improve sensorimotor representations in large language models?

TL;DR

Abstract

How does fine-tuning improve sensorimotor representations in large language models?

Authors

TL;DR

Abstract

Table of Contents

Figures (3)