Multimodal Object Detection via Probabilistic a priori Information Integration

Hafsa El Hafyani; Bastien Pasdeloup; Camille Yver; Pierre Romenteau

Multimodal Object Detection via Probabilistic a priori Information Integration

Hafsa El Hafyani, Bastien Pasdeloup, Camille Yver, Pierre Romenteau

TL;DR

This work addresses multimodal object detection in remote sensing under misalignment, where objects may appear in only one modality. It introduces probability maps derived from binary contextual masks to enable early fusion with RGB imagery, and evaluates two distance-based schemes within an end-to-end Faster-RCNN pipeline. Experiments on a DOTA subset with simulated misalignment show that probability-map fusion improves detection performance over unimodal baselines, with indirect context providing robust class discrimination in many cases. The approach offers a practical, alignment-robust path for leveraging contextual information in geospatial object detection, with future avenues including mid-fusion at multiple network layers and broader dataset validation.

Abstract

Multimodal object detection has shown promise in remote sensing. However, multimodal data frequently encounter the problem of low-quality, wherein the modalities lack strict cell-to-cell alignment, leading to mismatch between different modalities. In this paper, we investigate multimodal object detection where only one modality contains the target object and the others provide crucial contextual information. We propose to resolve the alignment problem by converting the contextual binary information into probability maps. We then propose an early fusion architecture that we validate with extensive experiments on the DOTA dataset.

Multimodal Object Detection via Probabilistic a priori Information Integration

TL;DR

Abstract

Paper Structure (16 sections, 2 equations, 3 figures, 2 tables)

This paper contains 16 sections, 2 equations, 3 figures, 2 tables.

Introduction
Related work
Generic Object Detection Methods
Deep Multimodal Fusion
Image Alignment
Contributions
Problem Statement
Probability Maps
Proposed Method
Experiments
Dataset
Experimental Setup
Experimental Results
Does fusion improve detection performance?
Which contextual information improves class discrimination?
...and 1 more sections

Figures (3)

Figure 1: Example of misalignment of contextual data with an acquired RGB image. On the left, the mask of roundabout does not match the RGB image (center). On the right, the mask of vehicles shows a shift w.r.t. the RGB image.
Figure 2: Example of the construction of probability maps from a binary mask (a) based on Eq. \ref{['eq:dist1']} (b) and Eq. \ref{['eq:dist2']} (c).
Figure 3: The pipeline of our framework.

Multimodal Object Detection via Probabilistic a priori Information Integration

TL;DR

Abstract

Multimodal Object Detection via Probabilistic a priori Information Integration

Authors

TL;DR

Abstract

Table of Contents

Figures (3)