Table of Contents
Fetching ...

Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection

Taeheon Kim, Sebin Shin, Youngjoon Yu, Hak Gu Kim, Yong Man Ro

TL;DR

This work tackles modality bias in multispectral pedestrian detection by explicitly modeling causality between RGB and thermal inputs. The proposed Causal Mode Multiplexer (CMM) learns data-type–specific causal graphs, using a common mode to capture total effects on day data and a differential mode to extract total indirect effects on ROTX data, thereby pruning the thermal direct effect that skews predictions. A novel ROTX-MP dataset is introduced to stress-test ROTX scenarios, and extensive experiments show improved generalization across KAIST, CVC-14, FLIR, and ROTX-MP, with AP gains up to 70.4 on ROTX-MP and notable improvements on existing benchmarks. The approach offers a principled, counterfactual-based debiasing framework that enhances reliability of multispectral detectors in safety-critical applications. The ROTX-MP release will catalyze further research into modality bias and causal reasoning in multimodal perception.

Abstract

RGBT multispectral pedestrian detection has emerged as a promising solution for safety-critical applications that require day/night operations. However, the modality bias problem remains unsolved as multispectral pedestrian detectors learn the statistical bias in datasets. Specifically, datasets in multispectral pedestrian detection mainly distribute between ROTO (day) and RXTO (night) data; the majority of the pedestrian labels statistically co-occur with their thermal features. As a result, multispectral pedestrian detectors show poor generalization ability on examples beyond this statistical correlation, such as ROTX data. To address this problem, we propose a novel Causal Mode Multiplexer (CMM) framework that effectively learns the causalities between multispectral inputs and predictions. Moreover, we construct a new dataset (ROTX-MP) to evaluate modality bias in multispectral pedestrian detection. ROTX-MP mainly includes ROTX examples not presented in previous datasets. Extensive experiments demonstrate that our proposed CMM framework generalizes well on existing datasets (KAIST, CVC-14, FLIR) and the new ROTX-MP. We will release our new dataset to the public for future research.

Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection

TL;DR

This work tackles modality bias in multispectral pedestrian detection by explicitly modeling causality between RGB and thermal inputs. The proposed Causal Mode Multiplexer (CMM) learns data-type–specific causal graphs, using a common mode to capture total effects on day data and a differential mode to extract total indirect effects on ROTX data, thereby pruning the thermal direct effect that skews predictions. A novel ROTX-MP dataset is introduced to stress-test ROTX scenarios, and extensive experiments show improved generalization across KAIST, CVC-14, FLIR, and ROTX-MP, with AP gains up to 70.4 on ROTX-MP and notable improvements on existing benchmarks. The approach offers a principled, counterfactual-based debiasing framework that enhances reliability of multispectral detectors in safety-critical applications. The ROTX-MP release will catalyze further research into modality bias and causal reasoning in multimodal perception.

Abstract

RGBT multispectral pedestrian detection has emerged as a promising solution for safety-critical applications that require day/night operations. However, the modality bias problem remains unsolved as multispectral pedestrian detectors learn the statistical bias in datasets. Specifically, datasets in multispectral pedestrian detection mainly distribute between ROTO (day) and RXTO (night) data; the majority of the pedestrian labels statistically co-occur with their thermal features. As a result, multispectral pedestrian detectors show poor generalization ability on examples beyond this statistical correlation, such as ROTX data. To address this problem, we propose a novel Causal Mode Multiplexer (CMM) framework that effectively learns the causalities between multispectral inputs and predictions. Moreover, we construct a new dataset (ROTX-MP) to evaluate modality bias in multispectral pedestrian detection. ROTX-MP mainly includes ROTX examples not presented in previous datasets. Extensive experiments demonstrate that our proposed CMM framework generalizes well on existing datasets (KAIST, CVC-14, FLIR) and the new ROTX-MP. We will release our new dataset to the public for future research.
Paper Structure (23 sections, 16 equations, 7 figures, 4 tables)

This paper contains 23 sections, 16 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: (a) ROTO, RXTO, and ROTX distribution in existing multispectral pedestrian datasets. (b) In the existing datasets, a high statistical correlation exists between pedestrian labels and their thermal features. (c) Example of a ROTX image. (d) Current multispectral pedestrian detection models fail prediction in ROTX image (the right person) due to learning statistical bias in datasets.
  • Figure 2: Structural Causal Models (SCMs) of (a) factual, (b) counterfactual, and (c) total indirect effect scenarios. The direct effect of $X \rightarrow Y$ can be eliminated due to counterfactual intervention.
  • Figure 3: Structural Causal Models (SCMs) of multispectral pedestrian detection on (a) ROTO and (b) RXTO data. (b) The thermal direct effect is considered in RXTO scenarios as models rely heavily on the thermal features for making predictions in the nighttime.
  • Figure 4: Our Structural Causal Model (SCM) formulations on (a) ROTO (day) and (b) RXTO (night) data. (a) We add direct links ($X_{R} \rightarrow Y$, $X_{T} \rightarrow Y$) to the conventional causal graph to determine modes and measure the thermal direct link. (b) The thermal direct link (red) hinders the model from learning causality. (c) implies the total indirect effect on RXTO and ROTX for which the thermal direct link is pruned. (d) We propose a Causal Mode Multiplexer framework that learns causality both on ROTO and RXTO data and thus generalizes well on all ROTO, RXTO, and ROTX.
  • Figure 5: ROTO, RXTO, and ROTX distribution of popular multispectral pedestrian datasets: (a) KAIST hwang2015multispectral, (b) CVC-14 gonzalez2016pedestrian, and (c) FLIR c:25. Images from all train/test sets are counted.
  • ...and 2 more figures