Computational Law: Datasets, Benchmarks, and Ontologies
Dilek Küçük, Fazli Can
TL;DR
The paper surveys the growing field of computational law, focusing on datasets, benchmarks, and ontologies to support AI in legal tasks. It provides a comprehensive inventory of datasets across languages and modalities (NER, summarization, QA, judgment prediction), catalogs benchmarks (Swiss Judgment Prediction, LexGLUE, LegalBench, FEDLEGAL, and more) that evaluate legal reasoning and retrieval in multilingual and cross-domain settings, and traces the development of legal ontologies from foundational to domain-specific resources (LKIF, CLO, PrOnto, ViLO). By synthesizing these resources, the work offers a practical reference for researchers and practitioners aiming to train, evaluate, and deploy interoperable legal AI systems. The analysis underscores the importance of combining rich datasets, rigorous benchmarks, and semantic ontologies to advance robust and scalable computational-law applications with real-world impact.
Abstract
Recent developments in computer science and artificial intelligence have also contributed to the legal domain, as revealed by the number and range of related publications and applications. Machine and deep learning models require considerable amount of domain-specific data for training and comparison purposes, in order to attain high-performance in the legal domain. Additionally, semantic resources such as ontologies are valuable for building large-scale computational legal systems, in addition to ensuring interoperability of such systems. Considering these aspects, we present an up-to-date review of the literature on datasets, benchmarks, and ontologies proposed for computational law. We believe that this comprehensive and recent review will help researchers and practitioners when developing and testing approaches and systems for computational law.
