SNE-RoadSegV2: Advancing Heterogeneous Feature Fusion and Fallibility Awareness for Freespace Detection
Yi Feng, Yu Ma, Qijun Chen, Ioannis Pitas, Rui Fan
TL;DR
This work tackles freespace detection for autonomous driving by addressing two core bottlenecks: discriminative fusion of heterogeneous features and supervision guidance that accounts for model fallibility. It introduces SNE-RoadSegV2, featuring HF^2B—comprising a Holistic Attention Module, a Heterogeneous Feature Contrast Descriptor, and an Affinity-Weighted Feature Recalibrator—paired with a lightweight decoder that leverages inter-scale skip connections. The model is trained with two fallibility-aware losses, Semantic Transition-Aware Loss and Depth Inconsistency-Aware Loss, integrated into a unified objective $L = L_{BCE} + \,\lambda_S L_{STA} + \,\lambda_D L_{DIA}$. Extensive experiments across KITTI Road, Cityscapes, vKITTI2, and KITTI Semantics demonstrate state-of-the-art performance, with the method ranking 1st on the KITTI Road benchmark and showing robust improvements near semantic-transition and depth-inconsistent regions. The approach offers practical impact by delivering more coherent, reliable freespace detection under challenging conditions and paves the way for extending heterogeneous feature fusion and fallibility-aware supervision to broader semantic segmentation tasks.
Abstract
Feature-fusion networks with duplex encoders have proven to be an effective technique to solve the freespace detection problem. However, despite the compelling results achieved by previous research efforts, the exploration of adequate and discriminative heterogeneous feature fusion, as well as the development of fallibility-aware loss functions remains relatively scarce. This paper makes several significant contributions to address these limitations: (1) It presents a novel heterogeneous feature fusion block, comprising a holistic attention module, a heterogeneous feature contrast descriptor, and an affinity-weighted feature recalibrator, enabling a more in-depth exploitation of the inherent characteristics of the extracted features, (2) it incorporates both inter-scale and intra-scale skip connections into the decoder architecture while eliminating redundant ones, leading to both improved accuracy and computational efficiency, and (3) it introduces two fallibility-aware loss functions that separately focus on semantic-transition and depth-inconsistent regions, collectively contributing to greater supervision during model training. Our proposed heterogeneous feature fusion network (SNE-RoadSegV2), which incorporates all these innovative components, demonstrates superior performance in comparison to all other freespace detection algorithms across multiple public datasets. Notably, it ranks the 1st on the official KITTI Road benchmark.
