RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives

Chirag Parikh; Deepti Rawat; Rakshitha R. T.; Tathagata Ghosh; Ravi Kiran Sarvadevabhatla

RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives

Chirag Parikh, Deepti Rawat, Rakshitha R. T., Tathagata Ghosh, Ravi Kiran Sarvadevabhatla

TL;DR

RoadSocial tackles the lack of global diversity in road event understanding by introducing a large-scale, social-media-driven VideoQA dataset. It employs a scalable semi-automatic annotation pipeline that fuses video and text LLMs to generate a rich set of QA pairs across 12 tasks, including challenging adversarial and incompatible questions to probe hallucination robustness. The dataset spans 14M frames from 13.2K videos with 260K QA pairs and 674 tags, enabling thorough evaluation of 18 Video LLMs and showing that fine-tuning general-purpose models benefits road-event understanding. This resource advances cross-viewpoint, cross-geography road understanding and provides a realistic benchmark for robustness, bias awareness, and practical deployment in intelligent transportation systems.

Abstract

We introduce RoadSocial, a large-scale, diverse VideoQA dataset tailored for generic road event understanding from social media narratives. Unlike existing datasets limited by regional bias, viewpoint bias and expert-driven annotations, RoadSocial captures the global complexity of road events with varied geographies, camera viewpoints (CCTV, handheld, drones) and rich social discourse. Our scalable semi-automatic annotation framework leverages Text LLMs and Video LLMs to generate comprehensive question-answer pairs across 12 challenging QA tasks, pushing the boundaries of road event understanding. RoadSocial is derived from social media videos spanning 14M frames and 414K social comments, resulting in a dataset with 13.2K videos, 674 tags and 260K high-quality QA pairs. We evaluate 18 Video LLMs (open-source and proprietary, driving-specific and general-purpose) on our road event understanding benchmark. We also demonstrate RoadSocial's utility in improving road event understanding capabilities of general-purpose Video LLMs.

RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives

TL;DR

Abstract

RoadSocial: A Diverse VideoQA Dataset and Benchmark for Road Event Understanding from Social Video Narratives

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (72)