Table of Contents
Fetching ...

Low-Resolution Action Recognition for Tiny Actions Challenge

Boyu Chen, Yu Qiao, Yali Wang

TL;DR

The paper addresses low-resolution action recognition in real-world surveillance with a long-tailed class distribution. It proposes a three-component solution: data-balanced video backbones, a dual-resolution distillation framework leveraging super-resolution, and model ensemble with post-processing. The approach yields strong improvements in F1 and achieves Top-1 on the Tiny Actions Challenge leaderboard. It demonstrates that combining data balance, cross-resolution knowledge transfer, and ensemble/post-processing can significantly boost recognition in challenging low-resolution, long-tailed settings.

Abstract

Tiny Actions Challenge focuses on understanding human activities in real-world surveillance. Basically, there are two main difficulties for activity recognition in this scenario. First, human activities are often recorded at a distance, and appear in a small resolution without much discriminative clue. Second, these activities are naturally distributed in a long-tailed way. It is hard to alleviate data bias for such heavy category imbalance. To tackle these problems, we propose a comprehensive recognition solution in this paper. First, we train video backbones with data balance, in order to alleviate overfitting in the challenge benchmark. Second, we design a dual-resolution distillation framework, which can effectively guide low-resolution action recognition by super-resolution knowledge. Finally, we apply model en-semble with post-processing, which can further boost per-formance on the long-tailed categories. Our solution ranks Top-1 on the leaderboard.

Low-Resolution Action Recognition for Tiny Actions Challenge

TL;DR

The paper addresses low-resolution action recognition in real-world surveillance with a long-tailed class distribution. It proposes a three-component solution: data-balanced video backbones, a dual-resolution distillation framework leveraging super-resolution, and model ensemble with post-processing. The approach yields strong improvements in F1 and achieves Top-1 on the Tiny Actions Challenge leaderboard. It demonstrates that combining data balance, cross-resolution knowledge transfer, and ensemble/post-processing can significantly boost recognition in challenging low-resolution, long-tailed settings.

Abstract

Tiny Actions Challenge focuses on understanding human activities in real-world surveillance. Basically, there are two main difficulties for activity recognition in this scenario. First, human activities are often recorded at a distance, and appear in a small resolution without much discriminative clue. Second, these activities are naturally distributed in a long-tailed way. It is hard to alleviate data bias for such heavy category imbalance. To tackle these problems, we propose a comprehensive recognition solution in this paper. First, we train video backbones with data balance, in order to alleviate overfitting in the challenge benchmark. Second, we design a dual-resolution distillation framework, which can effectively guide low-resolution action recognition by super-resolution knowledge. Finally, we apply model en-semble with post-processing, which can further boost per-formance on the long-tailed categories. Our solution ranks Top-1 on the leaderboard.
Paper Structure (7 sections, 1 equation, 1 figure, 3 tables)

This paper contains 7 sections, 1 equation, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Super-Resolution Videos. We use RealBasicVSR chan2021investigating to transform the original low-resolution videos as the corresponding super-resolution ones. As expected, super-resolution videos tend to emphasize action details and reduce sensor noise.