An Evaluation Framework for Attributed Information Retrieval using Large Language Models

Hanane Djeddal; Pierre Erbacher; Raouf Toukal; Laure Soulier; Karen Pinel-Sauvagnat; Sophia Katrenko; Lynda Tamine

An Evaluation Framework for Attributed Information Retrieval using Large Language Models

Hanane Djeddal, Pierre Erbacher, Raouf Toukal, Laure Soulier, Karen Pinel-Sauvagnat, Sophia Katrenko, Lynda Tamine

TL;DR

A reproducible framework to evaluate and benchmark attributed information seeking, using any backbone LLM, and different architectural designs: (1) Generate (2) Retrieve then Generate, and (3) Generate then Retrieve.

Abstract

With the growing success of Large Language models (LLMs) in information-seeking scenarios, search engines are now adopting generative approaches to provide answers along with in-line citations as attribution. While existing work focuses mainly on attributed question answering, in this paper, we target information-seeking scenarios which are often more challenging due to the open-ended nature of the queries and the size of the label space in terms of the diversity of candidate-attributed answers per query. We propose a reproducible framework to evaluate and benchmark attributed information seeking, using any backbone LLM, and different architectural designs: (1) Generate (2) Retrieve then Generate, and (3) Generate then Retrieve. Experiments using HAGRID, an attributed information-seeking dataset, show the impact of different scenarios on both the correctness and attributability of answers.

An Evaluation Framework for Attributed Information Retrieval using Large Language Models

TL;DR

Abstract

Paper Structure (17 sections, 3 equations, 2 figures, 3 tables)

This paper contains 17 sections, 3 equations, 2 figures, 3 tables.

Introduction
Experimental design
Task formulation
Dataset
Scenarios
Variants and baselines
Evaluation metrics
Answer correctness.
Citation quality.
Experiments
Effectiveness results
Complementary analysis
Impact of the number of supporting documents
Impact of Retrieval
Correlation between NLI and Annotations
...and 2 more sections

Figures (2)

Figure 1: LLM-based scenarios for IR with attribution.
Figure 2: Citations and Correctness metrics for varying values of the number of supporting documents in RTG-user-query scenario

An Evaluation Framework for Attributed Information Retrieval using Large Language Models

TL;DR

Abstract

An Evaluation Framework for Attributed Information Retrieval using Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (2)