Table of Contents
Fetching ...

LAPIS: Language Model-Augmented Police Investigation System

Heedou Kim, Dain Kim, Jiwoo Lee, Chanwoong Yoon, Donghee Choi, Mogan Gim, Jaewoo Kang

TL;DR

LAPIS tackles the challenge of providing legally sound, timely guidance to police investigators by finetuning a Korean small language model on a Crime Investigation Legal Reasoning (CIRL) dataset and augmenting it with a domain-specific knowledgebase (CIKB) via retrieval-augmented generation. The system comprises an evaluator $f$ that outputs a True/False assessment $y_a$ and a rationale $y_r$, and a retriever $g$ that supplies premises $\,\mathbb{P}$ from $\,\mathcal{K}$; together they produce outputs $\,\mathbf{y}=f(h,\mathcal{C},\mathbb{P})$ for hypothesis $h$ and context $\mathcal{C}$. Key contributions include the Crime Investigation Knowledgebase (CIKB), the CIRL dataset with expert-curated rationales, and a finetuned SLM that outperforms a baseline GPT-4 on hypothesis assessment accuracy and F1, while relying on domain-specific retrieval rather than proprietary models. The results demonstrate improved legal reasoning accuracy and explainability in crime investigations, with practical implications for police workflows and data privacy. Future work aims to multilingualize LAPIS, expand training data, and enable explicit investigative-action recommendations.

Abstract

Crime situations are race against time. An AI-assisted criminal investigation system, providing prompt but precise legal counsel is in need for police officers. We introduce LAPIS (Language Model Augmented Police Investigation System), an automated system that assists police officers to perform rational and legal investigative actions. We constructed a finetuning dataset and retrieval knowledgebase specialized in crime investigation legal reasoning task. We extended the dataset's quality by incorporating manual curation efforts done by a group of domain experts. We then finetuned the pretrained weights of a smaller Korean language model to the newly constructed dataset and integrated it with the crime investigation knowledgebase retrieval approach. Experimental results show LAPIS' potential in providing reliable legal guidance for police officers, even better than the proprietary GPT-4 model. Qualitative analysis on the rationales generated by LAPIS demonstrate the model's reasoning ability to leverage the premises and derive legally correct conclusions.

LAPIS: Language Model-Augmented Police Investigation System

TL;DR

LAPIS tackles the challenge of providing legally sound, timely guidance to police investigators by finetuning a Korean small language model on a Crime Investigation Legal Reasoning (CIRL) dataset and augmenting it with a domain-specific knowledgebase (CIKB) via retrieval-augmented generation. The system comprises an evaluator that outputs a True/False assessment and a rationale , and a retriever that supplies premises from ; together they produce outputs for hypothesis and context . Key contributions include the Crime Investigation Knowledgebase (CIKB), the CIRL dataset with expert-curated rationales, and a finetuned SLM that outperforms a baseline GPT-4 on hypothesis assessment accuracy and F1, while relying on domain-specific retrieval rather than proprietary models. The results demonstrate improved legal reasoning accuracy and explainability in crime investigations, with practical implications for police workflows and data privacy. Future work aims to multilingualize LAPIS, expand training data, and enable explicit investigative-action recommendations.

Abstract

Crime situations are race against time. An AI-assisted criminal investigation system, providing prompt but precise legal counsel is in need for police officers. We introduce LAPIS (Language Model Augmented Police Investigation System), an automated system that assists police officers to perform rational and legal investigative actions. We constructed a finetuning dataset and retrieval knowledgebase specialized in crime investigation legal reasoning task. We extended the dataset's quality by incorporating manual curation efforts done by a group of domain experts. We then finetuned the pretrained weights of a smaller Korean language model to the newly constructed dataset and integrated it with the crime investigation knowledgebase retrieval approach. Experimental results show LAPIS' potential in providing reliable legal guidance for police officers, even better than the proprietary GPT-4 model. Qualitative analysis on the rationales generated by LAPIS demonstrate the model's reasoning ability to leverage the premises and derive legally correct conclusions.
Paper Structure (20 sections, 1 equation, 3 figures, 2 tables)

This paper contains 20 sections, 1 equation, 3 figures, 2 tables.

Figures (3)

  • Figure 1: In a crime investigation scenario, different outcomes emerged based on hypothesis assessment, with a non-domain specific LLM struggling to answer accurately, whereas LAPIS provided a precise response with a relevant premise.
  • Figure 2: Illustrative workflow of LAPIS development process.
  • Figure 3: Utilization of LAPIS responses assuming a murder case. LAPIS appropriately references the provided CI Knowledge and assists in the process of criminal investigation.