Table of Contents
Fetching ...

SIGMUS: Semantic Integration for Knowledge Graphs in Multimodal Urban Spaces

Brian Wang, Mani Srivastava

TL;DR

SIGMUS addresses fragmentation of urban sensor data by leveraging LLM-based world knowledge to create a live, semantically rich knowledge graph linking incidents with multimodal observations. The approach ingests data from text, images, and tabular sensors, processes them with LLMs and vision-language models to extract actors, events, and observations, and then links them across modalities using cross-modal reasoning and a RAG-based incident merge workflow. The case study on the 2025 Los Angeles wildfires demonstrates plausible connections between CCTV, air quality, weather, and news reports, validating the potential for real-time monitoring and historical analysis. The work contributes an end-to-end pipeline, an ontology aligned with existing standards, and practical insights into latency, model choice, and future evaluation for urban knowledge systems.

Abstract

Modern urban spaces are equipped with an increasingly diverse set of sensors, all producing an abundance of multimodal data. Such multimodal data can be used to identify and reason about important incidents occurring in urban landscapes, such as major emergencies, cultural and social events, as well as natural disasters. However, such data may be fragmented over several sources and difficult to integrate due to the reliance on human-driven reasoning for identifying relationships between the multimodal data corresponding to an incident, as well as understanding the different components which define an incident. Such relationships and components are critical to identifying the causes of such incidents, as well as producing forecasting the scale and intensity of future incidents as they begin to develop. In this work, we create SIGMUS, a system for Semantic Integration for Knowledge Graphs in Multimodal Urban Spaces. SIGMUS uses Large Language Models (LLMs) to produce the necessary world knowledge for identifying relationships between incidents occurring in urban spaces and data from different modalities, allowing us to organize evidence and observations relevant to an incident without relying and human-encoded rules for relating multimodal sensory data with incidents. This organized knowledge is represented as a knowledge graph, organizing incidents, observations, and much more. We find that our system is able to produce reasonable connections between 5 different data sources (new article text, CCTV images, air quality, weather, and traffic measurements) and relevant incidents occurring at the same time and location.

SIGMUS: Semantic Integration for Knowledge Graphs in Multimodal Urban Spaces

TL;DR

SIGMUS addresses fragmentation of urban sensor data by leveraging LLM-based world knowledge to create a live, semantically rich knowledge graph linking incidents with multimodal observations. The approach ingests data from text, images, and tabular sensors, processes them with LLMs and vision-language models to extract actors, events, and observations, and then links them across modalities using cross-modal reasoning and a RAG-based incident merge workflow. The case study on the 2025 Los Angeles wildfires demonstrates plausible connections between CCTV, air quality, weather, and news reports, validating the potential for real-time monitoring and historical analysis. The work contributes an end-to-end pipeline, an ontology aligned with existing standards, and practical insights into latency, model choice, and future evaluation for urban knowledge systems.

Abstract

Modern urban spaces are equipped with an increasingly diverse set of sensors, all producing an abundance of multimodal data. Such multimodal data can be used to identify and reason about important incidents occurring in urban landscapes, such as major emergencies, cultural and social events, as well as natural disasters. However, such data may be fragmented over several sources and difficult to integrate due to the reliance on human-driven reasoning for identifying relationships between the multimodal data corresponding to an incident, as well as understanding the different components which define an incident. Such relationships and components are critical to identifying the causes of such incidents, as well as producing forecasting the scale and intensity of future incidents as they begin to develop. In this work, we create SIGMUS, a system for Semantic Integration for Knowledge Graphs in Multimodal Urban Spaces. SIGMUS uses Large Language Models (LLMs) to produce the necessary world knowledge for identifying relationships between incidents occurring in urban spaces and data from different modalities, allowing us to organize evidence and observations relevant to an incident without relying and human-encoded rules for relating multimodal sensory data with incidents. This organized knowledge is represented as a knowledge graph, organizing incidents, observations, and much more. We find that our system is able to produce reasonable connections between 5 different data sources (new article text, CCTV images, air quality, weather, and traffic measurements) and relevant incidents occurring at the same time and location.

Paper Structure

This paper contains 19 sections, 8 figures, 3 tables.

Figures (8)

  • Figure 1: High level overview of SIGMUS, which aims to identify incidents in urban spaces and draw connections to sensory data collected in those spaces.
  • Figure 2: The SIGMUS ontology
  • Figure 3: Technical architecture for ingesting knowledge into a live knowledge graph for urban analysis.
  • Figure 4: Example of actor and event parsing
  • Figure 5: Example of visual event classification
  • ...and 3 more figures