Ranking Entities along Conceptual Space Dimensions with LLMs: An Analysis of Fine-Tuning Strategies
Nitesh Kumar, Usashi Chatterjee, Steven Schockaert
TL;DR
The paper tackles ranking entities along conceptual space dimensions when ground-truth rankings are scarce. It investigates fine-tuning LLMs using readily available features and evaluates two ranking paradigms: pointwise scoring and pairwise comparisons, with SVM-based aggregation for full rankings. Across Wikidata, Taste, Rocks, Tag Genome, and Physical Properties, the authors show that training data containing perceptual or subjective features enables cross-domain transfer, and that pairwise methods generally yield strong performance, though well-tuned pointwise models can match or exceed baselines in some cases. The findings demonstrate that open-source LLMs can effectively construct high-quality conceptual space representations, offering practical guidance on training data selection and ranking strategies for perceptual knowledge modeling.
Abstract
Conceptual spaces represent entities in terms of their primitive semantic features. Such representations are highly valuable but they are notoriously difficult to learn, especially when it comes to modelling perceptual and subjective features. Distilling conceptual spaces from Large Language Models (LLMs) has recently emerged as a promising strategy, but existing work has been limited to probing pre-trained LLMs using relatively simple zero-shot strategies. We focus in particular on the task of ranking entities according to a given conceptual space dimension. Unfortunately, we cannot directly fine-tune LLMs on this task, because ground truth rankings for conceptual space dimensions are rare. We therefore use more readily available features as training data and analyse whether the ranking capabilities of the resulting models transfer to perceptual and subjective features. We find that this is indeed the case, to some extent, but having at least some perceptual and subjective features in the training data seems essential for achieving the best results.
