Table of Contents
Fetching ...

How does fine-tuning improve sensorimotor representations in large language models?

Minghua Wu, Javier Conde, Pedro Reviriego, Marc Brysbaert

TL;DR

It is demonstrated that the internal representations of LLMs can be steered toward more embodied, grounded patterns through fine-tuning, and that while sensorimotor improvements generalize robustly across languages and related sensory-motor dimensions, they are highly sensitive to the learning objective, failing to transfer across two disparate task formats.

Abstract

Large Language Models (LLMs) exhibit a significant "embodiment gap", where their text-based representations fail to align with human sensorimotor experiences. This study systematically investigates whether and how task-specific fine-tuning can bridge this gap. Utilizing Representational Similarity Analysis (RSA) and dimension-specific correlation metrics, we demonstrate that the internal representations of LLMs can be steered toward more embodied, grounded patterns through fine-tuning. Furthermore, the results show that while sensorimotor improvements generalize robustly across languages and related sensory-motor dimensions, they are highly sensitive to the learning objective, failing to transfer across two disparate task formats.

How does fine-tuning improve sensorimotor representations in large language models?

TL;DR

It is demonstrated that the internal representations of LLMs can be steered toward more embodied, grounded patterns through fine-tuning, and that while sensorimotor improvements generalize robustly across languages and related sensory-motor dimensions, they are highly sensitive to the learning objective, failing to transfer across two disparate task formats.

Abstract

Large Language Models (LLMs) exhibit a significant "embodiment gap", where their text-based representations fail to align with human sensorimotor experiences. This study systematically investigates whether and how task-specific fine-tuning can bridge this gap. Utilizing Representational Similarity Analysis (RSA) and dimension-specific correlation metrics, we demonstrate that the internal representations of LLMs can be steered toward more embodied, grounded patterns through fine-tuning. Furthermore, the results show that while sensorimotor improvements generalize robustly across languages and related sensory-motor dimensions, they are highly sensitive to the learning objective, failing to transfer across two disparate task formats.
Paper Structure (26 sections, 3 figures)

This paper contains 26 sections, 3 figures.

Figures (3)

  • Figure 1: Structural Alignment of Sensorimotor Representations Before and After Fine-tuning. Bar chart showing the overall Spearman’s $\rho$ correlation between model-derived and human-derived representational similarity matrices (RDMs) for all sensorimotor dimensions on the English evaluation set (a) and for sensory dimensions on the Dutch evaluation set (b). (c) Representational Distance Matrices (RDMs) for human ratings and model representations, based on a representative subset of 40 English words (the full dataset contains 1572 concepts). (d) Density distributions of Spearman’s $\rho$ for motor dimensions on the English set, derived from 200 bootstrap resampling iterations.
  • Figure 2: Dimension-wise representational alignment and the distribution of human ratings. Bar charts display the Spearman’s rank correlation coefficients ($\rho$) between model embeddings and human sensorimotor norms across different dimensions for the (a) English and (b) Dutch evaluation sets. Red brackets labeled ‘ns’ denote non-significant differences ($p \geq 0.5$) from pairwise comparison tests; all other pairwise differences are significant (p < 0.05). (c) illustrates the distribution of human ratings for 2358 concepts across each sensorimotor dimension in the English training set.
  • Figure 3: Comparative analysis of word-wise model representations against human ratings. (a, b) Distribution of model-human similarity (calculated via Euclidean distance) for different models on the (a) English and (b) Dutch evaluation sets. (c) Heatmap of inter-model Spearman's $\rho$ correlation, derived from the similarity scores in (a) and (b). (d) Scatter plots contrasting specific model pairs (from left to right, top to down): Base vs. En_FT, Base vs. Nl_FT, Base vs. QA_FT, and En_FT vs. Nl_FT. (e) Sensorimotor rating profiles for the exemplar word "SHOUTER". The radar chart compares the predicted rating scores (on a scale of 0–5) for each of the 11 sensorimotor dimensions across the four models (colored lines) with the actual human ratings (gray line).