ChronosLex: Time-aware Incremental Training for Temporal Generalization of Legal Classification Tasks
T. Y. S. S Santosh, Tuan-Quang Vuong, Matthias Grabmair
TL;DR
ChronosLex addresses temporal drift in legal multi-label text classification by introducing incremental fine-tuning over chronological data and evaluating robustness with both fixed and streaming time splits. The approach integrates continual learning and temporal-invariant strategies to balance past knowledge with new information, finding that continual learning generally enhances temporal generalization while temporal invariants often underperform. Across six datasets, streaming evaluation reveals consistent gains for continual methods, underscoring the importance of time-aware evaluation. The work advocates for broader adoption of streaming protocols and sets the stage for applying incremental training to other temporally sensitive legal tasks.
Abstract
This study investigates the challenges posed by the dynamic nature of legal multi-label text classification tasks, where legal concepts evolve over time. Existing models often overlook the temporal dimension in their training process, leading to suboptimal performance of those models over time, as they treat training data as a single homogeneous block. To address this, we introduce ChronosLex, an incremental training paradigm that trains models on chronological splits, preserving the temporal order of the data. However, this incremental approach raises concerns about overfitting to recent data, prompting an assessment of mitigation strategies using continual learning and temporal invariant methods. Our experimental results over six legal multi-label text classification datasets reveal that continual learning methods prove effective in preventing overfitting thereby enhancing temporal generalizability, while temporal invariant methods struggle to capture these dynamics of temporal shifts.
