LAPIS: Language Model-Augmented Police Investigation System
Heedou Kim, Dain Kim, Jiwoo Lee, Chanwoong Yoon, Donghee Choi, Mogan Gim, Jaewoo Kang
TL;DR
LAPIS tackles the challenge of providing legally sound, timely guidance to police investigators by finetuning a Korean small language model on a Crime Investigation Legal Reasoning (CIRL) dataset and augmenting it with a domain-specific knowledgebase (CIKB) via retrieval-augmented generation. The system comprises an evaluator $f$ that outputs a True/False assessment $y_a$ and a rationale $y_r$, and a retriever $g$ that supplies premises $\,\mathbb{P}$ from $\,\mathcal{K}$; together they produce outputs $\,\mathbf{y}=f(h,\mathcal{C},\mathbb{P})$ for hypothesis $h$ and context $\mathcal{C}$. Key contributions include the Crime Investigation Knowledgebase (CIKB), the CIRL dataset with expert-curated rationales, and a finetuned SLM that outperforms a baseline GPT-4 on hypothesis assessment accuracy and F1, while relying on domain-specific retrieval rather than proprietary models. The results demonstrate improved legal reasoning accuracy and explainability in crime investigations, with practical implications for police workflows and data privacy. Future work aims to multilingualize LAPIS, expand training data, and enable explicit investigative-action recommendations.
Abstract
Crime situations are race against time. An AI-assisted criminal investigation system, providing prompt but precise legal counsel is in need for police officers. We introduce LAPIS (Language Model Augmented Police Investigation System), an automated system that assists police officers to perform rational and legal investigative actions. We constructed a finetuning dataset and retrieval knowledgebase specialized in crime investigation legal reasoning task. We extended the dataset's quality by incorporating manual curation efforts done by a group of domain experts. We then finetuned the pretrained weights of a smaller Korean language model to the newly constructed dataset and integrated it with the crime investigation knowledgebase retrieval approach. Experimental results show LAPIS' potential in providing reliable legal guidance for police officers, even better than the proprietary GPT-4 model. Qualitative analysis on the rationales generated by LAPIS demonstrate the model's reasoning ability to leverage the premises and derive legally correct conclusions.
