Table of Contents
Fetching ...

CuSINeS: Curriculum-driven Structure Induced Negative Sampling for Statutory Article Retrieval

T. Y. S. S Santosh, Kristina Kaiser, Matthias Grabmair

TL;DR

This work tackles negative sampling in Statutory Article Retrieval by introducing CuSINeS, a curriculum-driven method that integrates structure-aware negative mining, dynamic model-driven semantic difficulty, and a progressive training schedule. By combining semantic and structure-based difficulty rankings through Reciprocal Rank Fusion and exposing negatives in an easy-to-hard sequence, CuSINeS yields consistent gains across multiple dense retrieval baselines on the BSARD dataset. The results highlight the value of leveraging the hierarchical and sequential organization of statutes to mine informative negatives and of adapting difficulty to the model's current competence. The approach is model-agnostic and holds promise for other legally structured corpora and retrieval tasks beyond SAR.

Abstract

In this paper, we introduce CuSINeS, a negative sampling approach to enhance the performance of Statutory Article Retrieval (SAR). CuSINeS offers three key contributions. Firstly, it employs a curriculum-based negative sampling strategy guiding the model to focus on easier negatives initially and progressively tackle more difficult ones. Secondly, it leverages the hierarchical and sequential information derived from the structural organization of statutes to evaluate the difficulty of samples. Lastly, it introduces a dynamic semantic difficulty assessment using the being-trained model itself, surpassing conventional static methods like BM25, adapting the negatives to the model's evolving competence. Experimental results on a real-world expert-annotated SAR dataset validate the effectiveness of CuSINeS across four different baselines, demonstrating its versatility.

CuSINeS: Curriculum-driven Structure Induced Negative Sampling for Statutory Article Retrieval

TL;DR

This work tackles negative sampling in Statutory Article Retrieval by introducing CuSINeS, a curriculum-driven method that integrates structure-aware negative mining, dynamic model-driven semantic difficulty, and a progressive training schedule. By combining semantic and structure-based difficulty rankings through Reciprocal Rank Fusion and exposing negatives in an easy-to-hard sequence, CuSINeS yields consistent gains across multiple dense retrieval baselines on the BSARD dataset. The results highlight the value of leveraging the hierarchical and sequential organization of statutes to mine informative negatives and of adapting difficulty to the model's current competence. The approach is model-agnostic and holds promise for other legally structured corpora and retrieval tasks beyond SAR.

Abstract

In this paper, we introduce CuSINeS, a negative sampling approach to enhance the performance of Statutory Article Retrieval (SAR). CuSINeS offers three key contributions. Firstly, it employs a curriculum-based negative sampling strategy guiding the model to focus on easier negatives initially and progressively tackle more difficult ones. Secondly, it leverages the hierarchical and sequential information derived from the structural organization of statutes to evaluate the difficulty of samples. Lastly, it introduces a dynamic semantic difficulty assessment using the being-trained model itself, surpassing conventional static methods like BM25, adapting the negatives to the model's evolving competence. Experimental results on a real-world expert-annotated SAR dataset validate the effectiveness of CuSINeS across four different baselines, demonstrating its versatility.
Paper Structure (16 sections, 1 equation, 1 figure, 4 tables)