Table of Contents
Fetching ...

Overview of the TREC 2023 NeuCLIR Track

Dawn Lawrie, Sean MacAvaney, James Mayfield, Paul McNamee, Douglas W. Oard, Luca Soldaini, Eugene Yang

TL;DR

NeuCLIR 2023 extends the prior year by adding a multilingual information retrieval task and a pilot technical documents task, while reaffirming the cross-language news retrieval framework across Chinese, Persian, and Russian collections. The track demonstrates notable gains in cross-language retrieval, with GPT-4 features prominently in top runs, and highlights both the stability of CLIR pooling and the challenges of MLIR fairness across languages. A detailed analysis of topic development, relevance judgments, and collection reuse informs methodological robustness and guides 2024 directions, including expanding the Chinese technical document task and launching cross-language report generation. The work underscores the practical value of multilingual neural IR while acknowledging ongoing gaps in cross-language technical domains and the need for broader annotator expertise and topic translations for future benchmarks.

Abstract

The principal goal of the TREC Neural Cross-Language Information Retrieval (NeuCLIR) track is to study the impact of neural approaches to cross-language information retrieval. The track has created four collections, large collections of Chinese, Persian, and Russian newswire and a smaller collection of Chinese scientific abstracts. The principal tasks are ranked retrieval of news in one of the three languages, using English topics. Results for a multilingual task, also with English topics but with documents from all three newswire collections, are also reported. New in this second year of the track is a pilot technical documents CLIR task for ranked retrieval of Chinese technical documents using English topics. A total of 220 runs across all tasks were submitted by six participating teams and, as baselines, by track coordinators. Task descriptions and results are presented.

Overview of the TREC 2023 NeuCLIR Track

TL;DR

NeuCLIR 2023 extends the prior year by adding a multilingual information retrieval task and a pilot technical documents task, while reaffirming the cross-language news retrieval framework across Chinese, Persian, and Russian collections. The track demonstrates notable gains in cross-language retrieval, with GPT-4 features prominently in top runs, and highlights both the stability of CLIR pooling and the challenges of MLIR fairness across languages. A detailed analysis of topic development, relevance judgments, and collection reuse informs methodological robustness and guides 2024 directions, including expanding the Chinese technical document task and launching cross-language report generation. The work underscores the practical value of multilingual neural IR while acknowledging ongoing gaps in cross-language technical domains and the need for broader annotator expertise and topic translations for future benchmarks.

Abstract

The principal goal of the TREC Neural Cross-Language Information Retrieval (NeuCLIR) track is to study the impact of neural approaches to cross-language information retrieval. The track has created four collections, large collections of Chinese, Persian, and Russian newswire and a smaller collection of Chinese scientific abstracts. The principal tasks are ranked retrieval of news in one of the three languages, using English topics. Results for a multilingual task, also with English topics but with documents from all three newswire collections, are also reported. New in this second year of the track is a pilot technical documents CLIR task for ranked retrieval of Chinese technical documents using English topics. A total of 220 runs across all tasks were submitted by six participating teams and, as baselines, by track coordinators. Task descriptions and results are presented.
Paper Structure (33 sections, 17 figures, 9 tables)

This paper contains 33 sections, 17 figures, 9 tables.

Figures (17)

  • Figure 1: CLIR nDCG@20.
  • Figure 2: MLIR nDCG@20.
  • Figure 3: MLIR Target Exposure and Fairness. X-axes of Figures (a) and (b) indicate topics sorted by the range of median per-language fairness among the three languages (range in Figure (b)). The X-axis of Figure (c) indicates runs.
  • Figure 4: nDCG@20 boxplots (left) and KDE graphs on the median scores (right) of the news task comparing topic developers. Topics are ordered by the median of the scores.
  • Figure 5: Sampled nDCG@20 boxplot of the news task comparing among the CLIR and MLIR tasks. The numbers at the left of each boxplot are the numbers of relevant documents in the corresponding task and topic.
  • ...and 12 more figures