Table of Contents
Fetching ...

BoMuDANet: Unsupervised Adaptation for Visual Scene Understanding in Unstructured Driving Environments

Divya Kothandaraman, Rohan Chandra, Dinesh Manocha

TL;DR

BoMuDANet addresses semantic scene understanding in unstructured driving by formulating unsupervised multi-source boundless domain adaptation for segmentation. It introduces Alt-Inc, an alternating training scheme that distills knowledge across multiple sources while adapting from a chosen best source, plus a pseudo-labeling strategy to recognize open-set objects during testing. The method achieves state-of-the-art or competitive performance on the India Driving Dataset and CityScapes in multi-source settings, with substantial gains over single-source baselines and reasonable model efficiency. The work is validated through quantitative experiments and a user study, demonstrating improved perception in unconstrained traffic and practical potential for real-world deployment.

Abstract

We present an unsupervised adaptation approach for visual scene understanding in unstructured traffic environments. Our method is designed for unstructured real-world scenarios with dense and heterogeneous traffic consisting of cars, trucks, two-and three-wheelers, and pedestrians. We describe a new semantic segmentation technique based on unsupervised domain adaptation (DA), that can identify the class or category of each region in RGB images or videos. We also present a novel self-training algorithm (Alt-Inc) for multi-source DA that improves the accuracy. Our overall approach is a deep learning-based technique and consists of an unsupervised neural network that achieves 87.18% accuracy on the challenging India Driving Dataset. Our method works well on roads that may not be well-marked or may include dirt, unidentifiable debris, potholes, etc. A key aspect of our approach is that it can also identify objects that are encountered by the model for the fist time during the testing phase. We compare our method against the state-of-the-art methods and show an improvement of 5.17% - 42.9%. Furthermore, we also conduct user studies that qualitatively validate the improvements in visual scene understanding of unstructured driving environments.

BoMuDANet: Unsupervised Adaptation for Visual Scene Understanding in Unstructured Driving Environments

TL;DR

BoMuDANet addresses semantic scene understanding in unstructured driving by formulating unsupervised multi-source boundless domain adaptation for segmentation. It introduces Alt-Inc, an alternating training scheme that distills knowledge across multiple sources while adapting from a chosen best source, plus a pseudo-labeling strategy to recognize open-set objects during testing. The method achieves state-of-the-art or competitive performance on the India Driving Dataset and CityScapes in multi-source settings, with substantial gains over single-source baselines and reasonable model efficiency. The work is validated through quantitative experiments and a user study, demonstrating improved perception in unconstrained traffic and practical potential for real-world deployment.

Abstract

We present an unsupervised adaptation approach for visual scene understanding in unstructured traffic environments. Our method is designed for unstructured real-world scenarios with dense and heterogeneous traffic consisting of cars, trucks, two-and three-wheelers, and pedestrians. We describe a new semantic segmentation technique based on unsupervised domain adaptation (DA), that can identify the class or category of each region in RGB images or videos. We also present a novel self-training algorithm (Alt-Inc) for multi-source DA that improves the accuracy. Our overall approach is a deep learning-based technique and consists of an unsupervised neural network that achieves 87.18% accuracy on the challenging India Driving Dataset. Our method works well on roads that may not be well-marked or may include dirt, unidentifiable debris, potholes, etc. A key aspect of our approach is that it can also identify objects that are encountered by the model for the fist time during the testing phase. We compare our method against the state-of-the-art methods and show an improvement of 5.17% - 42.9%. Furthermore, we also conduct user studies that qualitatively validate the improvements in visual scene understanding of unstructured driving environments.

Paper Structure

This paper contains 27 sections, 1 equation, 6 figures, 2 tables.

Figures (6)

  • Figure 1: We present a novel unsupervised deep learning-based approach called BoMuDANet for visual scene understanding in unconstrained and unstructured traffic environments. In this example, we demonstrate the benefits of BoMuDANet on images taken from the challenging IDD dataset. BoMuDANet accurately segments out dirt roads as terrains as well as a building, while preserving its shape. In contrast, the single source baselines (GTA/CityScapes) do not identify dirt and unstructured roads well, misclassify parts of sky as building, and fail to capture the shape of the unstructured building. BoMuDANet benefits from its ability to selectively distil information from various sources by iterative self-training, in addition to exploiting a chosen best source via domain adaptation.
  • Figure 2: Extension to SOTA in domain adaptive semantic segmentation. Our approach is the first method to simultaneously perform unsupervised multi-source boundless DA segmentation and can handle unstructured traffic environments.
  • Figure 3: Overview of BoMuDANet: The input consists of $N$ sources ($s_1, s_2, \ldots, s_N$), from which the best-source is selected by the Alternating-Incremental algorithm (Section \ref{['subsec: multi-source_DA']}). The Alt-Inc algorithm proceeds in an unsupervised fashion to generate the final set of pseudo-labels that are used to recognize out-of-distribution objects (Section \ref{['sec: boundless']}). The final output consists of the segmentation map of an image in the target domain.
  • Figure 4: In this example, we demonstrate the benefits of BoMuDANet on an image from the IDD dataset, depicting a mixture of challenging driving conditions. The top image in the second column shows the prediction of our model, and the bottom image shows the ground-truth. We observe that BoMuDANet accurately segments the autorickshaws (open-set object, a new type of vehicle - the third column zooms into the region containing the autorickshaw in the prediction and ground-truth), in addition to handling dense traffic and dirt roads.
  • Figure 5: Visual Results: BoMuDANet works well in various unconstrained environments including unmarked lanes, dirt roads, heavy traffic, and boundless category objects (auto-rickshaws) and results in higher accuracy. Each color represents a different object as shown in the color scheme.
  • ...and 1 more figures