A Socratic RAG Approach to Connect Natural Language Queries on Research Topics with Knowledge Organization Systems
Lew Lefton, Kexin Rong, Chinar Dankhara, Lila Ghemri, Firdous Kausar, A. Hannibal Hamdallahi
TL;DR
The paper addresses the problem of mapping natural-language research-topic queries to machine-interpretable Knowledge Organization Systems (KOS). It proposes a Socratic, multi-round Retrieval Augmented Generation (RAG) agent grounded in open KOS data to disambiguate topics and produce semantically registered entities, with a two-stage hierarchical topic retrieval that includes a retrieval score $score(t) = sim(q,t) + α ∑_{a ∈ ancestors(t)} β^d × sim(q, a)$ and a temperature-controlled initial search. The CollabNext demonstration provides a knowledge graph connecting people, organizations, and topics, emphasizing visibility for emerging researchers at historically underrepresented institutions. The approach aims to bridge little semantics (domain-specific KOS) with big semantics (bibliometric repositories), offering explainable topic grounding and broad applicability across research artifacts. If successful, it could enable transparent topic disambiguation, improved search interfaces, and richer integration of researchers and their outputs across disciplines.
Abstract
In this paper, we propose a Retrieval Augmented Generation (RAG) agent that maps natural language queries about research topics to precise, machine-interpretable semantic entities. Our approach combines RAG with Socratic dialogue to align a user's intuitive understanding of research topics with established Knowledge Organization Systems (KOSs). The proposed approach will effectively bridge "little semantics" (domain-specific KOS structures) with "big semantics" (broad bibliometric repositories), making complex academic taxonomies more accessible. Such agents have the potential for broad use. We illustrate with a sample application called CollabNext, which is a person-centric knowledge graph connecting people, organizations, and research topics. We further describe how the application design has an intentional focus on HBCUs and emerging researchers to raise visibility of people historically rendered invisible in the current science system.
