Table of Contents
Fetching ...

STCALIR: Semi-Synthetic Test Collection for Algerian Legal Information Retrieval

M'hamed Amine Hatem, Sofiane Batata, Amine Mammasse, Faiçal Azouaou

Abstract

Test collections are essential for evaluating retrieval and re-ranking models. However, constructing such collections is challenging due to the high cost of manual annotation, particularly in specialized domains like Algerian legal texts, where high-quality corpora and relevance judgments are scarce. To address this limitation, we propose STCALIR, a framework for generating semi-synthetic test collections directly from raw legal documents. The pipeline follows the Cranfield paradigm, maintaining its core components of topics, corpus, and relevance judgments, while significantly reducing manual effort through automated multi-stage retrieval and filtering, achieving a 99% reduction in annotation workload. We validate STCALIR using the Mr. TyDi benchmark, demonstrating that the resulting semi-synthetic relevance judgments yield retrieval effectiveness comparable to human-annotated evaluations (Hit@10 \approx 0.785). Furthermore, system-level rankings derived from these labels exhibit strong concordance with human-based evaluations, as measured by Kendall's τ (0.89) and Spearman's \r{ho} (0.92). Overall, STCALIR offers a reproducible and cost-efficient solution for constructing reliable test collections in low-resource legal domains.

STCALIR: Semi-Synthetic Test Collection for Algerian Legal Information Retrieval

Abstract

Test collections are essential for evaluating retrieval and re-ranking models. However, constructing such collections is challenging due to the high cost of manual annotation, particularly in specialized domains like Algerian legal texts, where high-quality corpora and relevance judgments are scarce. To address this limitation, we propose STCALIR, a framework for generating semi-synthetic test collections directly from raw legal documents. The pipeline follows the Cranfield paradigm, maintaining its core components of topics, corpus, and relevance judgments, while significantly reducing manual effort through automated multi-stage retrieval and filtering, achieving a 99% reduction in annotation workload. We validate STCALIR using the Mr. TyDi benchmark, demonstrating that the resulting semi-synthetic relevance judgments yield retrieval effectiveness comparable to human-annotated evaluations (Hit@10 \approx 0.785). Furthermore, system-level rankings derived from these labels exhibit strong concordance with human-based evaluations, as measured by Kendall's τ (0.89) and Spearman's \r{ho} (0.92). Overall, STCALIR offers a reproducible and cost-efficient solution for constructing reliable test collections in low-resource legal domains.

Paper Structure

This paper contains 21 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: 5 phase methodology for building semi-synthetic test collection
  • Figure 2: Side-by-Side Display of Original Algerian Arabic Legal Text and Its English Translation from the Algerian Official Gazette (Issue 71, 10 November 2004, Page 12)
  • Figure 3: Screenshot showing the Web Topics annotation interface
  • Figure 4: Screenshot showing the Web Interface for human relevance assessment
  • Figure 5: Screenshot showing the Web Topics annotation interface
  • ...and 1 more figures