Explainable machine learning multi-label classification of Spanish legal judgements
Francisco de Arriba-Pérez, Silvia García-Méndez, Francisco J. González-Castaño, Jaime González-González
TL;DR
This work tackles the dual challenge of multi-label classification of Spanish legal judgements and the need for explainability. It presents a hybrid pipeline with seven modules (text processing, legal entity detection, anonymisation, data processing, classification, evaluation, and explainability) that jointly deliver predictions and human-readable justifications, including natural language templates and visual explanations. Using a large, lawyer-annotated dataset (106,806 judgements) and a Random Forest with a multi-class transformation strategy, the approach achieves around $74\%$ accuracy and over $85\%$ precision, while providing interpretable outputs via LIME and tree visualizations. The results demonstrate practical viability for explainable legal AI, with thorough evaluation across processing, classification, and explanation components, and show potential for future improvements with extreme multi-label representations and transformer-based models.
Abstract
Artificial Intelligence techniques such as Machine Learning (ML) have not been exploited to their maximum potential in the legal domain. This has been partially due to the insufficient explanations they provided about their decisions. Automatic expert systems with explanatory capabilities can be specially useful when legal practitioners search jurisprudence to gather contextual knowledge for their cases. Therefore, we propose a hybrid system that applies ML for multi-label classification of judgements (sentences) and visual and natural language descriptions for explanation purposes, boosted by Natural Language Processing techniques and deep legal reasoning to identify the entities, such as the parties, involved. We are not aware of any prior work on automatic multi-label classification of legal judgements also providing natural language explanations to the end-users with comparable overall quality. Our solution achieves over 85 % micro precision on a labelled data set annotated by legal experts. This endorses its interest to relieve human experts from monotonous labour-intensive legal classification tasks.
