Table of Contents
Fetching ...

ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs

Fernando Ferraretto, Thiago Laitz, Roberto Lotufo, Rodrigo Nogueira

TL;DR

ExaRanker-Open extends ExaRanker by employing open-source LLMs to generate natural language explanations for IR training data, addressing cost and privacy constraints of proprietary models. It systematically evaluates multiple dataset sizes and two Llama-2 models, comparing against a cate gorical-label baseline and a GPT-3.5-based variant, using fine-tuned T5-base rankers and BEIR benchmarks. The findings show that explanations improve neural rankers, with larger LLMs delivering greater gains, and that data augmentation remains beneficial even on large datasets, including a notable $0.8$ $nDCG@10$ improvement over monoT5-400k. The work provides practical impact by confirming open models can rival and surpass baseline strong performers while preserving data privacy, and it releases code and data for community use.

Abstract

ExaRanker recently introduced an approach to training information retrieval (IR) models, incorporating natural language explanations as additional labels. The method addresses the challenge of limited labeled examples, leading to improvements in the effectiveness of IR models. However, the initial results were based on proprietary language models such as GPT-3.5, which posed constraints on dataset size due to its cost and data privacy. In this paper, we introduce ExaRanker-Open, where we adapt and explore the use of open-source language models to generate explanations. The method has been tested using different LLMs and datasets sizes to better comprehend the effective contribution of data augmentation. Our findings reveal that incorporating explanations consistently enhances neural rankers, with benefits escalating as the LLM size increases. Notably, the data augmentation method proves advantageous even with large datasets, as evidenced by ExaRanker surpassing the target baseline by 0.6 nDCG@10 points in our study. To encourage further advancements by the research community, we have open-sourced both the code and datasets at https://github.com/unicamp-dl/ExaRanker.

ExaRanker-Open: Synthetic Explanation for IR using Open-Source LLMs

TL;DR

ExaRanker-Open extends ExaRanker by employing open-source LLMs to generate natural language explanations for IR training data, addressing cost and privacy constraints of proprietary models. It systematically evaluates multiple dataset sizes and two Llama-2 models, comparing against a cate gorical-label baseline and a GPT-3.5-based variant, using fine-tuned T5-base rankers and BEIR benchmarks. The findings show that explanations improve neural rankers, with larger LLMs delivering greater gains, and that data augmentation remains beneficial even on large datasets, including a notable improvement over monoT5-400k. The work provides practical impact by confirming open models can rival and surpass baseline strong performers while preserving data privacy, and it releases code and data for community use.

Abstract

ExaRanker recently introduced an approach to training information retrieval (IR) models, incorporating natural language explanations as additional labels. The method addresses the challenge of limited labeled examples, leading to improvements in the effectiveness of IR models. However, the initial results were based on proprietary language models such as GPT-3.5, which posed constraints on dataset size due to its cost and data privacy. In this paper, we introduce ExaRanker-Open, where we adapt and explore the use of open-source language models to generate explanations. The method has been tested using different LLMs and datasets sizes to better comprehend the effective contribution of data augmentation. Our findings reveal that incorporating explanations consistently enhances neural rankers, with benefits escalating as the LLM size increases. Notably, the data augmentation method proves advantageous even with large datasets, as evidenced by ExaRanker surpassing the target baseline by 0.6 nDCG@10 points in our study. To encourage further advancements by the research community, we have open-sourced both the code and datasets at https://github.com/unicamp-dl/ExaRanker.
Paper Structure (4 sections, 2 figures, 3 tables)

This paper contains 4 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Average zero-shot results on 6 datasets of the BEIR benchmark with respect to training dataset size, comparing the 4 models evaluated.
  • Figure 2: Average zero-shot results on 6 datasets of the BEIR benchmark. monoT5-400k is finetuned on the 400k relevant query-passage pairs from MS MARCO without explanations. Note the log scale in horizontal axis.