Table of Contents
Fetching ...

First Place Solution to the ECCV 2024 ROAD++ Challenge @ ROAD++ Atomic Activity Recognition 2024

Ruyang Li, Tengfei Zhang, Heng Zhang, Tiejun Liu, Yanwei Wang, Xuelei Li

TL;DR

An atomic activity recognition data augmentation method is proposed, which greatly expands the sample space by flipping video frames and road topology, effectively mitigating model overfitting in this task.

Abstract

This report presents our team's technical solution for participating in Track 3 of the 2024 ECCV ROAD++ Challenge. The task of Track 3 is atomic activity recognition, which aims to identify 64 types of atomic activities in road scenes based on video content. Our approach primarily addresses the challenges of small objects, discriminating between single object and a group of objects, as well as model overfitting in this task. Firstly, we construct a multi-branch activity recognition framework that not only separates different object categories but also the tasks of single object and object group recognition, thereby enhancing recognition accuracy. Subsequently, we develop various model ensembling strategies, including integrations of multiple frame sampling sequences, different frame sampling sequence lengths, multiple training epochs, and different backbone networks. Furthermore, we propose an atomic activity recognition data augmentation method, which greatly expands the sample space by flipping video frames and road topology, effectively mitigating model overfitting. Our methods rank first in the test set of Track 3 for the ROAD++ Challenge 2024, and achieve 69% mAP.

First Place Solution to the ECCV 2024 ROAD++ Challenge @ ROAD++ Atomic Activity Recognition 2024

TL;DR

An atomic activity recognition data augmentation method is proposed, which greatly expands the sample space by flipping video frames and road topology, effectively mitigating model overfitting in this task.

Abstract

This report presents our team's technical solution for participating in Track 3 of the 2024 ECCV ROAD++ Challenge. The task of Track 3 is atomic activity recognition, which aims to identify 64 types of atomic activities in road scenes based on video content. Our approach primarily addresses the challenges of small objects, discriminating between single object and a group of objects, as well as model overfitting in this task. Firstly, we construct a multi-branch activity recognition framework that not only separates different object categories but also the tasks of single object and object group recognition, thereby enhancing recognition accuracy. Subsequently, we develop various model ensembling strategies, including integrations of multiple frame sampling sequences, different frame sampling sequence lengths, multiple training epochs, and different backbone networks. Furthermore, we propose an atomic activity recognition data augmentation method, which greatly expands the sample space by flipping video frames and road topology, effectively mitigating model overfitting. Our methods rank first in the test set of Track 3 for the ROAD++ Challenge 2024, and achieve 69% mAP.

Paper Structure

This paper contains 20 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: Illustration of multi-label atomic activity recognition kung2023action.
  • Figure 2: Examples of small objects.
  • Figure 3: Examples of single objects and object groups.
  • Figure 4: The multi-branch atomic activity recognition framework.
  • Figure 5: Integration of prediction results from multiple frame sampling sequences.
  • ...and 4 more figures