Designing Multi-Robot Ground Video Sensemaking with Public Safety Professionals
Puqi Zhou, Ali Asgarov, Aafiya Hussain, Wonjoon Park, Amit Paudyal, Sameep Shrestha, Chia-wei Tang, Michael F. Lighthiser, Michael R. Hieb, Xuesu Xiao, Chris Thomas, Sungsoo Ray Hong
TL;DR
The paper introduces MRVS, a human–AI system for multi-robot ground video sensemaking designed with public safety professionals. It presents a testbed with 38 Events of Interest, a 20-video dataset, and six design requirements, then implements MRVS with a multimodal backend and interactive frontend evaluated through algorithmic benchmarks and expert interviews. The results show MRVS can increase recall and overall usefulness for public-safety workflows, while highlighting concerns about false alarms, privacy, and governance. The study argues that configurable, explainable AI coupled with collaboration-centric interfaces can meaningfully scale situational awareness in resource-constrained policing environments, with implications for broader adoption and responsible deployment.
Abstract
Videos from fleets of ground robots can advance public safety by providing scalable situational awareness and reducing professionals' burden. Yet little is known about how to design and integrate multi-robot videos into public safety workflows. Collaborating with six police agencies, we examined how such videos could be made practical. In Study 1, we presented the first testbed for multi-robot ground video sensemaking. The testbed includes 38 events-of-interest (EoI) relevant to public safety, a dataset of 20 robot patrol videos (10 day/night pairs) covering EoI types, and 6 design requirements aimed at improving current video sensemaking practices. In Study 2, we built MRVS, a tool that augments multi-robot patrol video streams with a prompt-engineered video understanding model. Participants reported reduced manual workload and greater confidence with LLM-based explanations, while noting concerns about false alarms and privacy. We conclude with implications for designing future multi-robot video sensemaking tools.
