MULTI-CASE: A Transformer-based Ethics-aware Multimodal Investigative Intelligence Framework
Maximilian T. Fischer, Yannick Metz, Lucas Joos, Matthias Miller, Daniel A. Keim
TL;DR
MULTI-CASE tackles ethics and privacy challenges in multimodal intelligence analytics by delivering a holistic, visual-analytics framework that couples a fully integrated graph data model with modular, plug-in analytics and a GPU-accelerated knowledge graph UI. It combines transformer-based NER for domain-specific entity extraction with an ontology-driven search to enable cross-modal linking, explanation, and provenance, supporting equal human–AI agency. The paper presents a war-crimes case study and a formative expert evaluation showing improved human agency, transparency, and accountability, while acknowledging limitations in language generalization and potential over-reliance on automation. Overall, it offers a principled blueprint for accountable, multimodal intelligence exploration that can be extended with LLMs and multilingual capabilities while maintaining oversight and privacy safeguards.
Abstract
AI-driven models are increasingly deployed in operational analytics solutions, for instance, in investigative journalism or the intelligence community. Current approaches face two primary challenges: ethical and privacy concerns, as well as difficulties in efficiently combining heterogeneous data sources for multimodal analytics. To tackle the challenge of multimodal analytics, we present MULTI-CASE, a holistic visual analytics framework tailored towards ethics-aware and multimodal intelligence exploration, designed in collaboration with domain experts. It leverages an equal joint agency between human and AI to explore and assess heterogeneous information spaces, checking and balancing automation through Visual Analytics. MULTI-CASE operates on a fully-integrated data model and features type-specific analysis with multiple linked components, including a combined search, annotated text view, and graph-based analysis. Parts of the underlying entity detection are based on a RoBERTa-based language model, which we tailored towards user requirements through fine-tuning. An overarching knowledge exploration graph combines all information streams, provides in-situ explanations, transparent source attribution, and facilitates effective exploration. To assess our approach, we conducted a comprehensive set of evaluations: We benchmarked the underlying language model on relevant NER tasks, achieving state-of-the-art performance. The demonstrator was assessed according to intelligence capability assessments, while the methodology was evaluated according to ethics design guidelines. As a case study, we present our framework in an investigative journalism setting, supporting war crime investigations. Finally, we conduct a formative user evaluation with domain experts in law enforcement. Our evaluations confirm that our framework facilitates human agency and steering in security-sensitive applications.
