Table of Contents
Fetching ...

Unsupervised Multimodal Graph-based Model for Geo-social Analysis

Ehsaneddin Jalilian, Bernd Resch

TL;DR

This work addresses fragmentation in multimodal geo-social analysis by introducing an unsupervised graph-based framework that jointly embeds semantic and geographic information. It presents two architectures, MonoGraph and MultiGraph, employing SBERT-derived text embeddings and graph neural encoders to create a unified representation space, guided by a composite loss of contrastive, coherence, and alignment terms. Evaluations on four disaster datasets show improved topic quality, spatial coherence, and interpretability, with MultiGraph often outperforming MonoGraph and baselines. The approach is domain-agnostic and extensible to additional modalities and tasks, offering practical value for disaster response and geospatial analytics.

Abstract

The systematic analysis of user-generated social media content, especially when enriched with geospatial context, plays a vital role in domains such as disaster management and public opinion monitoring. Although multimodal approaches have made significant progress, most existing models remain fragmented, processing each modality separately rather than integrating them into a unified end-to-end model. To address this, we propose an unsupervised, multimodal graph-based methodology that jointly embeds semantic and geographic information into a shared representation space. The proposed methodology comprises two architectural paradigms: a mono graph (MonoGrah) model that jointly encodes both modalities, and a multi graph (MultiGraph) model that separately models semantic and geographic relationships and subsequently integrates them through multi-head attention mechanisms. A composite loss, combining contrastive, coherence, and alignment objectives, guides the learning process to produce semantically coherent and spatially compact clusters. Experiments on four real-world disaster datasets demonstrate that our models consistently outperform existing baselines in topic quality, spatial coherence, and interpretability. Inherently domain-independent, the framework can be readily extended to diverse forms of multimodal data and a wide range of downstream analysis tasks.

Unsupervised Multimodal Graph-based Model for Geo-social Analysis

TL;DR

This work addresses fragmentation in multimodal geo-social analysis by introducing an unsupervised graph-based framework that jointly embeds semantic and geographic information. It presents two architectures, MonoGraph and MultiGraph, employing SBERT-derived text embeddings and graph neural encoders to create a unified representation space, guided by a composite loss of contrastive, coherence, and alignment terms. Evaluations on four disaster datasets show improved topic quality, spatial coherence, and interpretability, with MultiGraph often outperforming MonoGraph and baselines. The approach is domain-agnostic and extensible to additional modalities and tasks, offering practical value for disaster response and geospatial analytics.

Abstract

The systematic analysis of user-generated social media content, especially when enriched with geospatial context, plays a vital role in domains such as disaster management and public opinion monitoring. Although multimodal approaches have made significant progress, most existing models remain fragmented, processing each modality separately rather than integrating them into a unified end-to-end model. To address this, we propose an unsupervised, multimodal graph-based methodology that jointly embeds semantic and geographic information into a shared representation space. The proposed methodology comprises two architectural paradigms: a mono graph (MonoGrah) model that jointly encodes both modalities, and a multi graph (MultiGraph) model that separately models semantic and geographic relationships and subsequently integrates them through multi-head attention mechanisms. A composite loss, combining contrastive, coherence, and alignment objectives, guides the learning process to produce semantically coherent and spatially compact clusters. Experiments on four real-world disaster datasets demonstrate that our models consistently outperform existing baselines in topic quality, spatial coherence, and interpretability. Inherently domain-independent, the framework can be readily extended to diverse forms of multimodal data and a wide range of downstream analysis tasks.

Paper Structure

This paper contains 20 sections, 18 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Layout of the two models: MonoGraph (a), and MultiGraph (b)
  • Figure 2: Similarity matrices and their corresponding 3D clustering visualizations (after dimensionality reduction) for a subset of the Napa dataset. The first row (a, b) shows the input embeddings, while the second row (c, d) and third row (e, f) display the results for MonoGraph and MultiGraph models, respectively.
  • Figure 3: Topic-word distributions for a subset of the Hurricane Harvey dataset, applying MonoGraph (top) and the MultiGraph (bottom) models.
  • Figure 4: Spatial distribution (KDE) of the Topic-words for a subset of the Hurricane Harvey dataset, applying MonoGraph (top) and the MultiGraph (bottom) models.
  • Figure 5: Moran‚Äôs I plots for MonoGraph (top) and MultiGraph (bottom) Models.
  • ...and 2 more figures