UniRank: A Multi-Agent Calibration Pipeline for Estimating University Rankings from Anonymized Bibliometric Signals

Pedram Riyazimehr; Seyyed Ehsan Mahmoudi

UniRank: A Multi-Agent Calibration Pipeline for Estimating University Rankings from Anonymized Bibliometric Signals

Pedram Riyazimehr, Seyyed Ehsan Mahmoudi

Abstract

We present UniRank, a multi-agent LLM pipeline that estimates university positions across global ranking systems using only publicly available bibliometric data from OpenAlex and Semantic Scholar. The system employs a three-stage architecture: (a) zero-shot estimation from anonymized institutional metrics, (b) per-system tool-augmented calibration against real ranked universities, and (c) final synthesis. Critically, institutions are anonymized -- names, countries, DOIs, paper titles, and collaboration countries are all redacted -- and their actual ranks are hidden from the calibration tools during evaluation, preventing LLM memorization from confounding results. On the Times Higher Education (THE) World University Rankings ($n=352$), the system achieves MAE = 251.5 rank positions, Median AE = 131.5, PNMAE = 12.03%, Spearman $ρ= 0.769$, Kendall $τ= 0.591$, hit rate @50 = 20.7%, hit rate @100 = 39.8%, and a Memorization Index of exactly zero (no exact-match zero-width predictions among all 352 universities). The systematic positive-signed error (+190.1 positions, indicating the system consistently predicts worse ranks than actual) and monotonic performance degradation from elite tier (MAE = 60.5, hit@100 = 90.5%) to tail tier (MAE = 328.2, hit@100 = 20.8%) provide strong evidence that the pipeline performs genuine analytical reasoning rather than recalling memorized rankings. A live demo is available at https://unirank.scinito.ai .

UniRank: A Multi-Agent Calibration Pipeline for Estimating University Rankings from Anonymized Bibliometric Signals

Abstract

), the system achieves MAE = 251.5 rank positions, Median AE = 131.5, PNMAE = 12.03%, Spearman

, Kendall

, hit rate @50 = 20.7%, hit rate @100 = 39.8%, and a Memorization Index of exactly zero (no exact-match zero-width predictions among all 352 universities). The systematic positive-signed error (+190.1 positions, indicating the system consistently predicts worse ranks than actual) and monotonic performance degradation from elite tier (MAE = 60.5, hit@100 = 90.5%) to tail tier (MAE = 328.2, hit@100 = 20.8%) provide strong evidence that the pipeline performs genuine analytical reasoning rather than recalling memorized rankings. A live demo is available at https://unirank.scinito.ai .

Paper Structure (51 sections, 6 equations, 10 figures, 9 tables, 1 algorithm)

This paper contains 51 sections, 6 equations, 10 figures, 9 tables, 1 algorithm.

Introduction
Problem Statement
Research Question
Why This Is Not Simple LLM Memorization
Contributions
Paper Organization
Related Work
University Ranking Systems
Bibliometric Analysis and Scientometrics
LLMs for Knowledge-Intensive Tasks
LLM Evaluation and Decontamination
System Architecture
Architecture Overview
Formal Problem Definition
Data Sources
...and 36 more sections

Figures (10)

Figure 1: UniRank system architecture. Data from OpenAlex and Semantic Scholar is aggregated, normalized, and anonymized before entering the three-stage LLM pipeline. During evaluation, the target university is hidden from the ranking store (dashed line) to prevent data leakage.
Figure 2: Three-stage pipeline: Stage 1 produces coarse zero-shot estimates from anonymized metrics. Stage 2 refines per-system with tool-augmented calibration (parallel). Stage 3 synthesizes the final report.
Figure 3: Anonymization before/after comparison. All identifying information (name, country, DOIs, paper titles, collaboration countries) is redacted; only numeric metrics are preserved.
Figure 4: Error distribution by evaluation tier (THE). Box plots show median (line), IQR (box), and outliers. Performance degrades monotonically from elite to tail.
Figure 5: Hit rate comparison across tiers at three thresholds (@25, @50, @100). Elite tier achieves 90.5% hit@100.
...and 5 more figures

UniRank: A Multi-Agent Calibration Pipeline for Estimating University Rankings from Anonymized Bibliometric Signals

Abstract

UniRank: A Multi-Agent Calibration Pipeline for Estimating University Rankings from Anonymized Bibliometric Signals

Authors

Abstract

Table of Contents

Figures (10)