Multimodal Object Detection via Probabilistic a priori Information Integration
Hafsa El Hafyani, Bastien Pasdeloup, Camille Yver, Pierre Romenteau
TL;DR
This work addresses multimodal object detection in remote sensing under misalignment, where objects may appear in only one modality. It introduces probability maps derived from binary contextual masks to enable early fusion with RGB imagery, and evaluates two distance-based schemes within an end-to-end Faster-RCNN pipeline. Experiments on a DOTA subset with simulated misalignment show that probability-map fusion improves detection performance over unimodal baselines, with indirect context providing robust class discrimination in many cases. The approach offers a practical, alignment-robust path for leveraging contextual information in geospatial object detection, with future avenues including mid-fusion at multiple network layers and broader dataset validation.
Abstract
Multimodal object detection has shown promise in remote sensing. However, multimodal data frequently encounter the problem of low-quality, wherein the modalities lack strict cell-to-cell alignment, leading to mismatch between different modalities. In this paper, we investigate multimodal object detection where only one modality contains the target object and the others provide crucial contextual information. We propose to resolve the alignment problem by converting the contextual binary information into probability maps. We then propose an early fusion architecture that we validate with extensive experiments on the DOTA dataset.
