Can LLMs Create Legally Relevant Summaries and Analyses of Videos?
Lyra Hoeben-Kuil, Gijs van Dijck, Jaromir Savelka, Johanna Gunawan, Konrad Kollnig, Marta Kolacz, Mindy Duffourc, Shashank Chakravarthy, Hannes Westermann
TL;DR
This study investigates whether multimodal LLMs can extract legally relevant facts from videos and draft Dutch-law legal letters, addressing access-to-justice barriers. Using a dataset of 120 YouTube clips across housing defects, traffic, property damage, and product malfunctions, the authors implement a pipeline with Gemini 2.5 Flash to generate video summaries (E1) and Dutch-law complaint letters (E2). Results show that 71.7% of summaries are rated high or medium in quality, with 64.2% completeness and 55% factuality, but performance varies by domain, modality, and video complexity, and letters exhibit notable factual and legal-context limitations. The work demonstrates promising potential for AI-assisted legal intake and forms, while underscoring the need for human oversight and targeted improvements before deployment in real-world decision-support tools.
Abstract
Understanding the legally relevant factual basis of an event and conveying it through text is a key skill of legal professionals. This skill is important for preparing forms (e.g., insurance claims) or other legal documents (e.g., court claims), but often presents a challenge for laypeople. Current AI approaches aim to bridge this gap, but mostly rely on the user to articulate what has happened in text, which may be challenging for many. Here, we investigate the capability of large language models (LLMs) to understand and summarize events occurring in videos. We ask an LLM to summarize and draft legal letters, based on 120 YouTube videos showing legal issues in various domains. Overall, 71.7\% of the summaries were rated as of high or medium quality, which is a promising result, opening the door to a number of applications in e.g. access to justice.
