Table of Contents
Fetching ...

AIANO: Enhancing Information Retrieval with AI-Augmented Annotation

Sameh Khattab, Marie Bauer, Lukas Heine, Till Rostalski, Jens Kleesiek, Julian Friedrich

TL;DR

This paper tackles the challenge of efficiently producing high-quality information retrieval datasets for retrieval-augmented generation by introducing AIANO, a specialized, AI-augmented annotation tool. AIANO combines configurable annotation blocks with three collaboration modes and full-text search, enabling a human-in-the-loop workflow that leverages LLM assistance without sacrificing control. In a within-subject study with $n=15$, AIANO outperformed a baseline tool by nearly doubling annotation speed and improving retrieval metrics (e.g., recall from $0.78$ to $0.88$ and F1 from $0.79$ to $0.86$), while reducing cognitive workload and increasing usability. The results demonstrate that tightly integrated AI-assisted annotation and search can accelerate IR dataset creation and enhance data quality, offering a practical path toward more effective IR system evaluation and RAG applications in retrieval-intensive domains.

Abstract

The rise of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) has rapidly increased the need for high-quality, curated information retrieval datasets. These datasets, however, are currently created with off-the-shelf annotation tools that make the annotation process complex and inefficient. To streamline this process, we developed a specialized annotation tool - AIANO. By adopting an AI-augmented annotation workflow that tightly integrates human expertise with LLM assistance, AIANO enables annotators to leverage AI suggestions while retaining full control over annotation decisions. In a within-subject user study ($n = 15$), participants created question-answering datasets using both a baseline tool and AIANO. AIANO nearly doubled annotation speed compared to the baseline while being easier to use and improving retrieval accuracy. These results demonstrate that AIANO's AI-augmented approach accelerates and enhances dataset creation for information retrieval tasks, advancing annotation capabilities in retrieval-intensive domains.

AIANO: Enhancing Information Retrieval with AI-Augmented Annotation

TL;DR

This paper tackles the challenge of efficiently producing high-quality information retrieval datasets for retrieval-augmented generation by introducing AIANO, a specialized, AI-augmented annotation tool. AIANO combines configurable annotation blocks with three collaboration modes and full-text search, enabling a human-in-the-loop workflow that leverages LLM assistance without sacrificing control. In a within-subject study with , AIANO outperformed a baseline tool by nearly doubling annotation speed and improving retrieval metrics (e.g., recall from to and F1 from to ), while reducing cognitive workload and increasing usability. The results demonstrate that tightly integrated AI-assisted annotation and search can accelerate IR dataset creation and enhance data quality, offering a practical path toward more effective IR system evaluation and RAG applications in retrieval-intensive domains.

Abstract

The rise of Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) has rapidly increased the need for high-quality, curated information retrieval datasets. These datasets, however, are currently created with off-the-shelf annotation tools that make the annotation process complex and inefficient. To streamline this process, we developed a specialized annotation tool - AIANO. By adopting an AI-augmented annotation workflow that tightly integrates human expertise with LLM assistance, AIANO enables annotators to leverage AI suggestions while retaining full control over annotation decisions. In a within-subject user study (), participants created question-answering datasets using both a baseline tool and AIANO. AIANO nearly doubled annotation speed compared to the baseline while being easier to use and improving retrieval accuracy. These results demonstrate that AIANO's AI-augmented approach accelerates and enhances dataset creation for information retrieval tasks, advancing annotation capabilities in retrieval-intensive domains.
Paper Structure (33 sections, 4 figures, 1 table)

This paper contains 33 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Workflow of the AIANO annotation system. (i) Project Creation Phase: Configure project metadata, input/output schemas, annotation levels, and AIANO Blocks. (ii) Project Configuration Phase: Configure annotation blocks with LLM provider and upload documents for annotation. (iii) Annotation Phase: Annotators highlight text, trigger AI-assisted content generation, review, edit, and export the dataset. The cycle icon indicates iterative refinement.
  • Figure 2: NASA-TLX workload assessment. Subscale scores across six dimensions and overall workload. Lower scores indicate lower workload. $* p < 0.05, ** p < 0.01, *** p < 0.001$.
  • Figure 3: Usability questionnaire ratings. Likert scale ratings ($1-5$) across eight usability dimensions and a composite score. Higher scores indicate better user experience. $** p < 0.01, *** p < 0.001$.
  • Figure 4: Task completion time. Individual participant task durations (in minutes) are shown as overlaid circles ($n = 15$).