C-DIRA: Computationally Efficient Dynamic ROI Routing and Domain-Invariant Adversarial Learning for Lightweight Driver Behavior Recognition
Keito Inoshita
TL;DR
C-DIRA tackles real-time driver distraction recognition on edge devices by integrating a lightweight dual-path architecture that fuses global context with saliency-driven ROI cues. It introduces dynamic ROI routing to selectively allocate computation to difficult samples and uses pseudo-domain labeling with adversarial learning to achieve domain-invariant representations, enhancing generalization to unseen drivers and environments. Empirical results on the State Farm dataset show competitive accuracy with significantly fewer FLOPs and latency, plus improved robustness under visual degradations and stronger domain generalization, validating the approach's practicality for edge deployment. The work highlights how targeted local feature extraction and principled domain suppression can reconcile efficiency and performance in visually demanding driver monitoring tasks.
Abstract
Driver distraction behavior recognition using in-vehicle cameras demands real-time inference on edge devices. However, lightweight models often fail to capture fine-grained behavioral cues, resulting in reduced performance on unseen drivers or under varying conditions. ROI-based methods also increase computational cost, making it difficult to balance efficiency and accuracy. This work addresses the need for a lightweight architecture that overcomes these constraints. We propose Computationally efficient Dynamic region of Interest Routing and domain-invariant Adversarial learning for lightweight driver behavior recognition (C-DIRA). The framework combines saliency-driven Top-K ROI pooling and fused classification for local feature extraction and integration. Dynamic ROI routing enables selective computation by applying ROI inference only to high difficulty data samples. Moreover, pseudo-domain labeling and adversarial learning are used to learn domain-invariant features robust to driver and background variation. Experiments on the State Farm Distracted Driver Detection Dataset show that C-DIRA maintains high accuracy with significantly fewer FLOPs and lower latency than prior lightweight models. It also demonstrates robustness under visual degradation such as blur and low-light, and stable performance across unseen domains. These results confirm C-DIRA's effectiveness in achieving compactness, efficiency, and generalization.
