Resource-Efficient RGB-Only Action Recognition for Edge Deployment

Dongsik Yoon; Jongeun Kim; Dayeon Lee

Resource-Efficient RGB-Only Action Recognition for Edge Deployment

Dongsik Yoon, Jongeun Kim, Dayeon Lee

TL;DR

This work proposes a compact RGB-only network tailored for efficient on-device inference, builds upon an X3D-style backbone augmented with Temporal Shift, and further introduces selective temporal adaptation and parameter-free attention.

Abstract

Action recognition on edge devices poses stringent constraints on latency, memory, storage, and power consumption. While auxiliary modalities such as skeleton and depth information can enhance recognition performance, they often require additional sensors or computationally expensive pose-estimation pipelines, limiting practicality for edge use. In this work, we propose a compact RGB-only network tailored for efficient on-device inference. Our approach builds upon an X3D-style backbone augmented with Temporal Shift, and further introduces selective temporal adaptation and parameter-free attention. Extensive experiments on the NTU RGB+D 60 and 120 benchmarks demonstrate a strong accuracy-efficiency balance. Moreover, deployment-level profiling on the Jetson Orin Nano verifies a smaller on-device footprint and practical resource utilization compared to existing RGB-based action recognition techniques.

Resource-Efficient RGB-Only Action Recognition for Edge Deployment

TL;DR

Abstract

Paper Structure (16 sections, 1 equation, 2 figures, 3 tables)

This paper contains 16 sections, 1 equation, 2 figures, 3 tables.

Introduction
Related work
RGB-only Action Recognition
Lightweight Action Recognition
Efficient Architectural Primitives
Method
Overall Architecture
Efficient Building Blocks
Selective Temporal Adaptation and Training
Experiments
Implementation Details
Comparisons to state-of-the-art
Ablation Study
Edge Deployability Analysis
Limitations
...and 1 more sections

Figures (2)

Figure 1: Overview of the proposed RGB-only network (X3D-UGT) with UIB in Stages 1--2 and T-UIB in Stages 3--4.
Figure 2: Stage 4 T-UIB block: UIB with selective temporal adaptation (TAda) and lightweight pointwise design.

Resource-Efficient RGB-Only Action Recognition for Edge Deployment

TL;DR

Abstract

Resource-Efficient RGB-Only Action Recognition for Edge Deployment

Authors

TL;DR

Abstract

Table of Contents

Figures (2)