Table of Contents
Fetching ...

UniRank: A Multi-Agent Calibration Pipeline for Estimating University Rankings from Anonymized Bibliometric Signals

Pedram Riyazimehr, Seyyed Ehsan Mahmoudi

Abstract

We present UniRank, a multi-agent LLM pipeline that estimates university positions across global ranking systems using only publicly available bibliometric data from OpenAlex and Semantic Scholar. The system employs a three-stage architecture: (a) zero-shot estimation from anonymized institutional metrics, (b) per-system tool-augmented calibration against real ranked universities, and (c) final synthesis. Critically, institutions are anonymized -- names, countries, DOIs, paper titles, and collaboration countries are all redacted -- and their actual ranks are hidden from the calibration tools during evaluation, preventing LLM memorization from confounding results. On the Times Higher Education (THE) World University Rankings ($n=352$), the system achieves MAE = 251.5 rank positions, Median AE = 131.5, PNMAE = 12.03%, Spearman $ρ= 0.769$, Kendall $τ= 0.591$, hit rate @50 = 20.7%, hit rate @100 = 39.8%, and a Memorization Index of exactly zero (no exact-match zero-width predictions among all 352 universities). The systematic positive-signed error (+190.1 positions, indicating the system consistently predicts worse ranks than actual) and monotonic performance degradation from elite tier (MAE = 60.5, hit@100 = 90.5%) to tail tier (MAE = 328.2, hit@100 = 20.8%) provide strong evidence that the pipeline performs genuine analytical reasoning rather than recalling memorized rankings. A live demo is available at https://unirank.scinito.ai .

UniRank: A Multi-Agent Calibration Pipeline for Estimating University Rankings from Anonymized Bibliometric Signals

Abstract

We present UniRank, a multi-agent LLM pipeline that estimates university positions across global ranking systems using only publicly available bibliometric data from OpenAlex and Semantic Scholar. The system employs a three-stage architecture: (a) zero-shot estimation from anonymized institutional metrics, (b) per-system tool-augmented calibration against real ranked universities, and (c) final synthesis. Critically, institutions are anonymized -- names, countries, DOIs, paper titles, and collaboration countries are all redacted -- and their actual ranks are hidden from the calibration tools during evaluation, preventing LLM memorization from confounding results. On the Times Higher Education (THE) World University Rankings (), the system achieves MAE = 251.5 rank positions, Median AE = 131.5, PNMAE = 12.03%, Spearman , Kendall , hit rate @50 = 20.7%, hit rate @100 = 39.8%, and a Memorization Index of exactly zero (no exact-match zero-width predictions among all 352 universities). The systematic positive-signed error (+190.1 positions, indicating the system consistently predicts worse ranks than actual) and monotonic performance degradation from elite tier (MAE = 60.5, hit@100 = 90.5%) to tail tier (MAE = 328.2, hit@100 = 20.8%) provide strong evidence that the pipeline performs genuine analytical reasoning rather than recalling memorized rankings. A live demo is available at https://unirank.scinito.ai .
Paper Structure (51 sections, 6 equations, 10 figures, 9 tables, 1 algorithm)

This paper contains 51 sections, 6 equations, 10 figures, 9 tables, 1 algorithm.

Figures (10)

  • Figure 1: UniRank system architecture. Data from OpenAlex and Semantic Scholar is aggregated, normalized, and anonymized before entering the three-stage LLM pipeline. During evaluation, the target university is hidden from the ranking store (dashed line) to prevent data leakage.
  • Figure 2: Three-stage pipeline: Stage 1 produces coarse zero-shot estimates from anonymized metrics. Stage 2 refines per-system with tool-augmented calibration (parallel). Stage 3 synthesizes the final report.
  • Figure 3: Anonymization before/after comparison. All identifying information (name, country, DOIs, paper titles, collaboration countries) is redacted; only numeric metrics are preserved.
  • Figure 4: Error distribution by evaluation tier (THE). Box plots show median (line), IQR (box), and outliers. Performance degrades monotonically from elite to tail.
  • Figure 5: Hit rate comparison across tiers at three thresholds (@25, @50, @100). Elite tier achieves 90.5% hit@100.
  • ...and 5 more figures