Metric Learning Encoding Models: A Multivariate Framework for Interpreting Neural Representations
Louis Jalouzot, Christophe Pallier, Emmanuel Chemla, Yair Lakretz
TL;DR
Metric Learning Encoding Models (MLEMs) recast neural representation interpretation as a metric-learning problem over a space of theoretical features. By learning a symmetric positive definite matrix $W$ that defines a weighted distance on feature distances, MLEM captures both feature effects and their interactions, optimizing a rank-based alignment with neural distances. Across simulations and LLM-derived data, MLEM demonstrates superior weight recovery, robustness to noise, and faster convergence than FR-RSA-I, while yielding interpretable, layer-wise geometric structures in representations. The framework is modality-agnostic, scalable via batch-based training, and supported by open-source software for broad application to neuroscience and AI settings.
Abstract
Understanding how explicit theoretical features are encoded in opaque neural systems is a central challenge now common to neuroscience and AI. We introduce Metric Learning Encoding Models (MLEMs) to address this challenge most directly as a metric learning problem: we fit the distance in the space of theoretical features to match the distance in neural space. Our framework improves on univariate encoding and decoding methods by building on second-order isomorphism methods, such as Representational Similarity Analysis, and extends them by learning a metric that efficiently models feature as well as interactions between them. The effectiveness of MLEM is validated through two sets of simulations. First, MLEMs recover ground-truth importance features in synthetic datasets better than state-of-the-art methods, such as Feature Reweighted RSA (FR-RSA). Second, we deploy MLEMs on real language data, where they show stronger robustness to noise in calculating the importance of linguistic features (gender, tense, etc.). MLEMs are applicable to any domains where theoretical features can be identified, such as language, vision, audition, etc. We release optimized code applicable to measure feature importance in the representations of any artificial neural networks or empirical neural data at https://github.com/LouisJalouzot/MLEM.
