Table of Contents
Fetching ...

IDD-X: A Multi-View Dataset for Ego-relative Important Object Localization and Explanation in Dense and Unstructured Traffic

Chirag Parikh, Rohit Saluja, C. V. Jawahar, Ravi Kiran Sarvadevabhatla

TL;DR

IDD-X tackles the challenge of explainable driving in dense, unstructured traffic by providing a large dual-view dataset with ego-relative annotations for multiple important objects, including rearview observations. The authors introduce two tasks—Important Object Localization and Important Object Explanation—and propose deep networks tailored to these tasks, including a class-conditioned BiGRU for object importance and a TSN-based framework with TOI-Align for object-level explanations. Experimental results demonstrate strong performance in object-importance detection and driving-behavior recognition, with context augmentation improving performance on rare tail explanations. Overall, IDD-X offers a rich, globally relevant resource and modeling framework for understanding how heterogeneous road actors influence ego-vehicle decisions in complex traffic.

Abstract

Intelligent vehicle systems require a deep understanding of the interplay between road conditions, surrounding entities, and the ego vehicle's driving behavior for safe and efficient navigation. This is particularly critical in developing countries where traffic situations are often dense and unstructured with heterogeneous road occupants. Existing datasets, predominantly geared towards structured and sparse traffic scenarios, fall short of capturing the complexity of driving in such environments. To fill this gap, we present IDD-X, a large-scale dual-view driving video dataset. With 697K bounding boxes, 9K important object tracks, and 1-12 objects per video, IDD-X offers comprehensive ego-relative annotations for multiple important road objects covering 10 categories and 19 explanation label categories. The dataset also incorporates rearview information to provide a more complete representation of the driving environment. We also introduce custom-designed deep networks aimed at multiple important object localization and per-object explanation prediction. Overall, our dataset and introduced prediction models form the foundation for studying how road conditions and surrounding entities affect driving behavior in complex traffic situations.

IDD-X: A Multi-View Dataset for Ego-relative Important Object Localization and Explanation in Dense and Unstructured Traffic

TL;DR

IDD-X tackles the challenge of explainable driving in dense, unstructured traffic by providing a large dual-view dataset with ego-relative annotations for multiple important objects, including rearview observations. The authors introduce two tasks—Important Object Localization and Important Object Explanation—and propose deep networks tailored to these tasks, including a class-conditioned BiGRU for object importance and a TSN-based framework with TOI-Align for object-level explanations. Experimental results demonstrate strong performance in object-importance detection and driving-behavior recognition, with context augmentation improving performance on rare tail explanations. Overall, IDD-X offers a rich, globally relevant resource and modeling framework for understanding how heterogeneous road actors influence ego-vehicle decisions in complex traffic.

Abstract

Intelligent vehicle systems require a deep understanding of the interplay between road conditions, surrounding entities, and the ego vehicle's driving behavior for safe and efficient navigation. This is particularly critical in developing countries where traffic situations are often dense and unstructured with heterogeneous road occupants. Existing datasets, predominantly geared towards structured and sparse traffic scenarios, fall short of capturing the complexity of driving in such environments. To fill this gap, we present IDD-X, a large-scale dual-view driving video dataset. With 697K bounding boxes, 9K important object tracks, and 1-12 objects per video, IDD-X offers comprehensive ego-relative annotations for multiple important road objects covering 10 categories and 19 explanation label categories. The dataset also incorporates rearview information to provide a more complete representation of the driving environment. We also introduce custom-designed deep networks aimed at multiple important object localization and per-object explanation prediction. Overall, our dataset and introduced prediction models form the foundation for studying how road conditions and surrounding entities affect driving behavior in complex traffic situations.
Paper Structure (9 sections, 5 figures, 4 tables)

This paper contains 9 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Bird Eye View illustration of Explanations for Important Objects in different traffic situations captured in the IDD-X dataset. Each row (A to G) and column (1 to 4) pair is an explanation card. Inside each card, the yellow header corresponds to the explanation category while the purple footer corresponds to the possible ego vehicle's driving behaviors in the demonstrated traffic situation. Icon notations used in the figure are defined in the bottom right corner. Important Object(s) in each card can either be visible in the front-camera view, in the rear-camera view, or in both of them. The corresponding notation for the views is shown in the bottom right corner of each card. The Driving Direction notation shows that the vehicles should ideally drive on their left-hand side of the road according to Indian traffic rules. The Novel Explanation Category icon means that the category is unique to this dataset, and has not been found in the existing driving datasets.
  • Figure 2: Samples of the annotated driving scenarios in IDD-X. The ego vehicle's driving behavior annotation is displayed at the top of each scenario ($\text{S}_{1}$ and $\text{S}_{2}$). The important object location annotations in front and rear views ($\text{S}_{1}^{\text{F}}$, $\text{S}_{2}^{\text{F}}$, $\text{S}_{1}^{\text{R}}$, $\text{S}_{2}^{\text{R}}$) at timeframes ($\text{T}_{1}$ to $\text{T}_{4}$) are shown with colored bounding boxes. Unique colors are assigned to the objects on the basis of their explanation category. The object's explanation annotation is attached to each bounding box with the explanation card ID referenced from Figure \ref{['fig:IDDX_Explanations_BEV']}. The Bird's Eye View (BEV) illustration shows the trajectory of ego vehicle and annotated important objects as observed in four timeframes. The icon notations used in BEV are defined at the bottom of this figure. Sample explanation cards are shown in BEV for comparison of the important object's maneuvering styles and traffic situations.
  • Figure 3: Statistics of the number of Important Objects in a driving scenario for every driving action class.
  • Figure 4: Distribution of Explanations for Important Objects for every road object category and the average duration of the corresponding object tracks.
  • Figure 5: The proposed approaches for Important Object Localization and Explanation in Dense and Unstructured Traffic.