Table of Contents
Fetching ...

MIFI: MultI-camera Feature Integration for Roust 3D Distracted Driver Activity Recognition

Jian Kuang, Wenjing Li, Fang Li, Jun Zhang, Zhongcheng Wu

TL;DR

The paper tackles distracted driver activity recognition, a fine-grained task hampered by limited single-view perspective and uneven sample difficulty. It introduces MultI-camera Feature Integration (MIFI), a late-fusion framework that integrates features from two views through sum, channel, and temporal concatenation, coupled with a Cyclical Focal Loss (CASL) to address difficulty-inconsistent samples. Experiments on the 3MDAD dataset show consistent performance gains over single-view baselines across backbones (notably I3D) and fusion variants, with CASL contributing substantial improvements and multi-view fusion providing complementary information. The approach offers a practical path toward robust, multi-view driver monitoring and can be extended to other same-modality multi-view tasks; future work includes lightweight architectures and nighttime generalization.

Abstract

Distracted driver activity recognition plays a critical role in risk aversion-particularly beneficial in intelligent transportation systems. However, most existing methods make use of only the video from a single view and the difficulty-inconsistent issue is neglected. Different from them, in this work, we propose a novel MultI-camera Feature Integration (MIFI) approach for 3D distracted driver activity recognition by jointly modeling the data from different camera views and explicitly re-weighting examples based on their degree of difficulty. Our contributions are two-fold: (1) We propose a simple but effective multi-camera feature integration framework and provide three types of feature fusion techniques. (2) To address the difficulty-inconsistent problem in distracted driver activity recognition, a periodic learning method, named example re-weighting that can jointly learn the easy and hard samples, is presented. The experimental results on the 3MDAD dataset demonstrate that the proposed MIFI can consistently boost performance compared to single-view models.

MIFI: MultI-camera Feature Integration for Roust 3D Distracted Driver Activity Recognition

TL;DR

The paper tackles distracted driver activity recognition, a fine-grained task hampered by limited single-view perspective and uneven sample difficulty. It introduces MultI-camera Feature Integration (MIFI), a late-fusion framework that integrates features from two views through sum, channel, and temporal concatenation, coupled with a Cyclical Focal Loss (CASL) to address difficulty-inconsistent samples. Experiments on the 3MDAD dataset show consistent performance gains over single-view baselines across backbones (notably I3D) and fusion variants, with CASL contributing substantial improvements and multi-view fusion providing complementary information. The approach offers a practical path toward robust, multi-view driver monitoring and can be extended to other same-modality multi-view tasks; future work includes lightweight architectures and nighttime generalization.

Abstract

Distracted driver activity recognition plays a critical role in risk aversion-particularly beneficial in intelligent transportation systems. However, most existing methods make use of only the video from a single view and the difficulty-inconsistent issue is neglected. Different from them, in this work, we propose a novel MultI-camera Feature Integration (MIFI) approach for 3D distracted driver activity recognition by jointly modeling the data from different camera views and explicitly re-weighting examples based on their degree of difficulty. Our contributions are two-fold: (1) We propose a simple but effective multi-camera feature integration framework and provide three types of feature fusion techniques. (2) To address the difficulty-inconsistent problem in distracted driver activity recognition, a periodic learning method, named example re-weighting that can jointly learn the easy and hard samples, is presented. The experimental results on the 3MDAD dataset demonstrate that the proposed MIFI can consistently boost performance compared to single-view models.
Paper Structure (25 sections, 11 equations, 8 figures, 7 tables)

This paper contains 25 sections, 11 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Three types of distracted behaviors.
  • Figure 2: The information from a single-view video is often limited, making us hardly recognize the distraction.
  • Figure 3: The framework of the proposed MIFI method.
  • Figure 4: The comparison of the loss weighting factors for $L_{h}=-(1-p)^{\lambda _{1}}log(p) -(p)^{\lambda_{2}}log(1-p)$.
  • Figure 5: (a) The effect of the number of frames. (b) The effect of the cyclical factor $\beta$.
  • ...and 3 more figures