No Argument Left Behind: Overlapping Chunks for Faster Processing of Arbitrarily Long Legal Texts
Israel Fama, Bárbara Bueno, Alexandre Alcoforado, Thomas Palmeira Ferraz, Arnold Moya, Anna Helena Reali Costa
TL;DR
This work tackles the challenge of analyzing arbitrarily long legal documents for Legal Judgment Prediction in the Brazilian context. It introduces uBERT, a Transformer-RNN hybrid that processes full texts by splitting them into overlapping chunks, extracting chunk representations from the final Transformer layers, and integrating them with an RNN. Results show that overlapive chunking yields improvements over BERT+LSTM and that uBERT generally beats ULMFiT on many long-text scenarios while remaining significantly faster; however, ULMFiT can still outperform uBERT on the very longest texts, highlighting a trade-off between accuracy and efficiency. Practically, uBERT offers a scalable approach for full-document analysis in legal NLP and motivates further work on chunking strategies and cross-language validation.
Abstract
In a context where the Brazilian judiciary system, the largest in the world, faces a crisis due to the slow processing of millions of cases, it becomes imperative to develop efficient methods for analyzing legal texts. We introduce uBERT, a hybrid model that combines Transformer and Recurrent Neural Network architectures to effectively handle long legal texts. Our approach processes the full text regardless of its length while maintaining reasonable computational overhead. Our experiments demonstrate that uBERT achieves superior performance compared to BERT+LSTM when overlapping input is used and is significantly faster than ULMFiT for processing long legal documents.
