MIFI: MultI-camera Feature Integration for Roust 3D Distracted Driver Activity Recognition
Jian Kuang, Wenjing Li, Fang Li, Jun Zhang, Zhongcheng Wu
TL;DR
The paper tackles distracted driver activity recognition, a fine-grained task hampered by limited single-view perspective and uneven sample difficulty. It introduces MultI-camera Feature Integration (MIFI), a late-fusion framework that integrates features from two views through sum, channel, and temporal concatenation, coupled with a Cyclical Focal Loss (CASL) to address difficulty-inconsistent samples. Experiments on the 3MDAD dataset show consistent performance gains over single-view baselines across backbones (notably I3D) and fusion variants, with CASL contributing substantial improvements and multi-view fusion providing complementary information. The approach offers a practical path toward robust, multi-view driver monitoring and can be extended to other same-modality multi-view tasks; future work includes lightweight architectures and nighttime generalization.
Abstract
Distracted driver activity recognition plays a critical role in risk aversion-particularly beneficial in intelligent transportation systems. However, most existing methods make use of only the video from a single view and the difficulty-inconsistent issue is neglected. Different from them, in this work, we propose a novel MultI-camera Feature Integration (MIFI) approach for 3D distracted driver activity recognition by jointly modeling the data from different camera views and explicitly re-weighting examples based on their degree of difficulty. Our contributions are two-fold: (1) We propose a simple but effective multi-camera feature integration framework and provide three types of feature fusion techniques. (2) To address the difficulty-inconsistent problem in distracted driver activity recognition, a periodic learning method, named example re-weighting that can jointly learn the easy and hard samples, is presented. The experimental results on the 3MDAD dataset demonstrate that the proposed MIFI can consistently boost performance compared to single-view models.
