Table of Contents
Fetching ...

NESTLE: a No-Code Tool for Statistical Analysis of Legal Corpus

Kyoungyeon Cho, Seungkum Han, Young Rok Choi, Wonseok Hwang

TL;DR

Nestle introduces a no-code tool that unifies retrieval, information extraction, and statistical analysis for large-scale legal corpora by coupling a chat-driven LLM interface with a custom end-to-end IE module. It supports user-defined ontologies and requires as few as four seed examples to train the IE model via few-shot learning, leveraging distillation to an open-source backbone for scalability. Across 15 Korean KorPrec-IE tasks and 3 LexGLUE English classifications, Nestle achieves GPT-4 comparable accuracy with significantly lower cost and faster inference than purely LLM-based approaches, illustrating strong practicality for industrial-scale legal analytics. The framework demonstrates generalizability to new datasets (LBoxOpen, LexGLUE) and offers GUI-based fine-grained control, enabling scalable, customizable statistical analysis without coding.

Abstract

The statistical analysis of large scale legal corpus can provide valuable legal insights. For such analysis one needs to (1) select a subset of the corpus using document retrieval tools, (2) structure text using information extraction (IE) systems, and (3) visualize the data for the statistical analysis. Each process demands either specialized tools or programming skills whereas no comprehensive unified "no-code" tools have been available. Here we provide NESTLE, a no-code tool for large-scale statistical analysis of legal corpus. Powered by a Large Language Model (LLM) and the internal custom end-to-end IE system, NESTLE can extract any type of information that has not been predefined in the IE system opening up the possibility of unlimited customizable statistical analysis of the corpus without writing a single line of code. We validate our system on 15 Korean precedent IE tasks and 3 legal text classification tasks from LexGLUE. The comprehensive experiments reveal NESTLE can achieve GPT-4 comparable performance by training the internal IE module with 4 human-labeled, and 192 LLM-labeled examples.

NESTLE: a No-Code Tool for Statistical Analysis of Legal Corpus

TL;DR

Nestle introduces a no-code tool that unifies retrieval, information extraction, and statistical analysis for large-scale legal corpora by coupling a chat-driven LLM interface with a custom end-to-end IE module. It supports user-defined ontologies and requires as few as four seed examples to train the IE model via few-shot learning, leveraging distillation to an open-source backbone for scalability. Across 15 Korean KorPrec-IE tasks and 3 LexGLUE English classifications, Nestle achieves GPT-4 comparable accuracy with significantly lower cost and faster inference than purely LLM-based approaches, illustrating strong practicality for industrial-scale legal analytics. The framework demonstrates generalizability to new datasets (LBoxOpen, LexGLUE) and offers GUI-based fine-grained control, enabling scalable, customizable statistical analysis without coding.

Abstract

The statistical analysis of large scale legal corpus can provide valuable legal insights. For such analysis one needs to (1) select a subset of the corpus using document retrieval tools, (2) structure text using information extraction (IE) systems, and (3) visualize the data for the statistical analysis. Each process demands either specialized tools or programming skills whereas no comprehensive unified "no-code" tools have been available. Here we provide NESTLE, a no-code tool for large-scale statistical analysis of legal corpus. Powered by a Large Language Model (LLM) and the internal custom end-to-end IE system, NESTLE can extract any type of information that has not been predefined in the IE system opening up the possibility of unlimited customizable statistical analysis of the corpus without writing a single line of code. We validate our system on 15 Korean precedent IE tasks and 3 legal text classification tasks from LexGLUE. The comprehensive experiments reveal NESTLE can achieve GPT-4 comparable performance by training the internal IE module with 4 human-labeled, and 192 LLM-labeled examples.
Paper Structure (19 sections, 3 figures, 4 tables)

This paper contains 19 sections, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Illustration of Nestle.
  • Figure 2: The workflow of Nestle
  • Figure 3: Trade-off analysis on Fraud task focuses on three real-world metrics: (a) accuracy, (b) cost, and (c) time.