How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders

Michael McCoubrey; Angelo Salatino; Francesco Osborne; Enrico Motta

How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders

Michael McCoubrey, Angelo Salatino, Francesco Osborne, Enrico Motta

TL;DR

The study investigates how LLMs encode scientific quality by extracting monosemantic features from sparse autoencoders. It uses three bibliometric proxies (citation count quartile, SJR quartile, h-index quartile) and trains interpretable decision trees on features generated from Gemma family LLMs with different SAE configurations. Four recurring feature types emerge: methodology, publication type, high impact fields or technologies, and specialized jargon. These findings provide a transparent window into LLM representations of research quality and support the use of interpretable features in evaluation tasks.

Abstract

In recent years, there has been a growing use of generative AI, and large language models (LLMs) in particular, to support both the assessment and generation of scientific work. Although some studies have shown that LLMs can, to a certain extent, evaluate research according to perceived quality, our understanding of the internal mechanisms that enable this capability remains limited. This paper presents the first study that investigates how LLMs encode the concept of scientific quality through relevant monosemantic features extracted using sparse autoencoders. We derive such features under different experimental settings and assess their ability to serve as predictors across three tasks related to research quality: predicting citation count, journal SJR, and journal h-index. The results indicate that LLMs encode features associated with multiple dimensions of scientific quality. In particular, we identify four recurring types of features that capture key aspects of how research quality is represented: 1) features reflecting research methodologies; 2) features related to publication type, with literature reviews typically exhibiting higher impact; 3) features associated with high-impact research fields and technologies; and 4) features corresponding to specific scientific jargons. These findings represent an important step toward understanding how LLMs encapsulate concepts related to research quality.

How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders

TL;DR

Abstract

Paper Structure (8 sections, 9 figures, 4 tables)

This paper contains 8 sections, 9 figures, 4 tables.

Introduction
Related Work
Methodology
Results and Discussion
Task 1 - Predicting citation quartiles
Task 2 - Predicting SJR quartiles
Task 3 - Predicting journal h-index quartiles
Conclusions

Figures (9)

Figure 1: Decision tree obtained from Task 1 using the LLM & SAE combination 1. Feature definitions are provided in Table \ref{['task_1_features']}, where activation_i denotes the feature with index i.
Figure 2: Decision tree obtained from Task 1 using the LLM & SAE combination 2. Feature definitions are provided in Table \ref{['task_1_features']}, where activation_i denotes the feature with index i.
Figure 3: Decision tree obtained from Task 1 using the LLM & SAE combination 3. Feature definitions are provided in Table \ref{['task_1_features']}, where activation_i denotes the feature with index i.
Figure 4: Decision tree obtained from Task 2 using the LLM & SAE combination 1. Feature definitions are provided in Table \ref{['task_2_features']}, where activation_i denotes the feature with index i.
Figure 5: Decision tree obtained from Task 2 using the LLM & SAE combination 2. Feature definitions are provided in Table \ref{['task_2_features']}, where activation_i denotes the feature with index i.
...and 4 more figures

How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders

TL;DR

Abstract

How Do LLMs Encode Scientific Quality? An Empirical Study Using Monosemantic Features from Sparse Autoencoders

Authors

TL;DR

Abstract

Table of Contents

Figures (9)