Table of Contents
Fetching ...

ASPIRE: Assistive System for Performance Evaluation in IR

Georgios Peikos, Wojciech Kusa, Symeon Symeonidis

TL;DR

ASPIRE addresses the complex task of IR evaluation by providing a web-based visual analytics platform that enables in-depth, multi-faceted analysis of IR experiments beyond traditional metrics. Built with Python and Streamlit, it integrates standard IR tools and statistical methods to support single- and multi-run comparisons, query-level and query-characteristics analyses, and collection-level retrieval insights, demonstrated on the TREC Clinical Trials corpus. Its modular architecture, input validation, and exportable outputs facilitate reproducibility and easy adoption by researchers, organizers, and practitioners, both online and locally. By linking retrieval results with publication data, ASPIRE promotes transparency and deeper engagement with experimental evidence, with ongoing work to extend its capabilities and adoption in the IR community.

Abstract

Information Retrieval (IR) evaluation involves far more complexity than merely presenting performance measures in a table. Researchers often need to compare multiple models across various dimensions, such as the Precision-Recall trade-off and response time, to understand the reasons behind the varying performance of specific queries for different models. We introduce ASPIRE (Assistive System for Performance Evaluation in IR), a visual analytics tool designed to address these complexities by providing an extensive and user-friendly interface for in-depth analysis of IR experiments. ASPIRE supports four key aspects of IR experiment evaluation and analysis: single/multi-experiment comparisons, query-level analysis, query characteristics-performance interplay, and collection-based retrieval analysis. We showcase the functionality of ASPIRE using the TREC Clinical Trials collection. ASPIRE is an open-source toolkit available online: https://github.com/GiorgosPeikos/ASPIRE

ASPIRE: Assistive System for Performance Evaluation in IR

TL;DR

ASPIRE addresses the complex task of IR evaluation by providing a web-based visual analytics platform that enables in-depth, multi-faceted analysis of IR experiments beyond traditional metrics. Built with Python and Streamlit, it integrates standard IR tools and statistical methods to support single- and multi-run comparisons, query-level and query-characteristics analyses, and collection-level retrieval insights, demonstrated on the TREC Clinical Trials corpus. Its modular architecture, input validation, and exportable outputs facilitate reproducibility and easy adoption by researchers, organizers, and practitioners, both online and locally. By linking retrieval results with publication data, ASPIRE promotes transparency and deeper engagement with experimental evidence, with ongoing work to extend its capabilities and adoption in the IR community.

Abstract

Information Retrieval (IR) evaluation involves far more complexity than merely presenting performance measures in a table. Researchers often need to compare multiple models across various dimensions, such as the Precision-Recall trade-off and response time, to understand the reasons behind the varying performance of specific queries for different models. We introduce ASPIRE (Assistive System for Performance Evaluation in IR), a visual analytics tool designed to address these complexities by providing an extensive and user-friendly interface for in-depth analysis of IR experiments. ASPIRE supports four key aspects of IR experiment evaluation and analysis: single/multi-experiment comparisons, query-level analysis, query characteristics-performance interplay, and collection-based retrieval analysis. We showcase the functionality of ASPIRE using the TREC Clinical Trials collection. ASPIRE is an open-source toolkit available online: https://github.com/GiorgosPeikos/ASPIRE

Paper Structure

This paper contains 6 sections, 2 figures.

Figures (2)

  • Figure 1: ASPIRE's functionalities. Each column is a different web page, and each block is a section of a web page. Red blocks indicate user actions, green blocks represent performance evaluation result tables, purple blocks represent plots and analysis, and yellow blocks show analysis.
  • Figure 2: ASPIRE's user interface and examples of its functionality using runs submitted in TREC Clinical Trials 2021 peikos2022unimib.