Decoupled Prompt-Adapter Tuning for Continual Activity Recognition

Di Fu; Thanh Vinh Vo; Haozhe Ma; Tze-Yun Leong

Decoupled Prompt-Adapter Tuning for Continual Activity Recognition

Di Fu, Thanh Vinh Vo, Haozhe Ma, Tze-Yun Leong

TL;DR

DPAT addresses continual action recognition by decoupling prompt and adapter tuning within a frozen Vision Transformer backbone. It combines temporal and spatial adapters with learnable prompts in a two-stage training regime, first establishing generalization via Prefix tuning and then specializing with adapters while preserving prompts. A redesigned, softmax-normalized query-key matching loss enhances task-specific key selection, improving both accuracy and memory retention without replayed data. Experiments on Kinetics-400, ActivityNet, and EPIC-Kitchens-100 demonstrate state-of-the-art performance and reduced forgetting, highlighting the approach's practical value for memory-efficient, continual video understanding in real-world settings.

Abstract

Action recognition technology plays a vital role in enhancing security through surveillance systems, enabling better patient monitoring in healthcare, providing in-depth performance analysis in sports, and facilitating seamless human-AI collaboration in domains such as manufacturing and assistive technologies. The dynamic nature of data in these areas underscores the need for models that can continuously adapt to new video data without losing previously acquired knowledge, highlighting the critical role of advanced continual action recognition. To address these challenges, we propose Decoupled Prompt-Adapter Tuning (DPAT), a novel framework that integrates adapters for capturing spatial-temporal information and learnable prompts for mitigating catastrophic forgetting through a decoupled training strategy. DPAT uniquely balances the generalization benefits of prompt tuning with the plasticity provided by adapters in pretrained vision models, effectively addressing the challenge of maintaining model performance amidst continuous data evolution without necessitating extensive finetuning. DPAT consistently achieves state-of-the-art performance across several challenging action recognition benchmarks, thus demonstrating the effectiveness of our model in the domain of continual action recognition.

Decoupled Prompt-Adapter Tuning for Continual Activity Recognition

TL;DR

Abstract

Paper Structure (23 sections, 5 equations, 2 figures, 8 tables, 2 algorithms)

This paper contains 23 sections, 5 equations, 2 figures, 8 tables, 2 algorithms.

Introduction
Related Work
Preliminary
Continual Action Recognition
Continual Learning with Prefix-Tuning
Method
Position of Adapter and Prompt
Decoupled Prompt-Adapter Tuning
Redesigned Query-Key Matching loss
Training Objective
Experiments
Experiments Settings
Comparison with baseline
Ablation Studies
Conclusion
...and 8 more sections

Figures (2)

Figure 1: Overview of the proposed Decoupled Prompt-Adapter Tuning (DPAT) approach: (a) Model architecture integrating adapters and prefix prompts to facilitate adaptation to new tasks; (b) Decoupled training paradigm designed to bolster knowledge preservation through phase-separated optimization of the model components
Figure 2: Comparative Result of DPAT with Joint and Decoupled Training Strategies on Kinetics-400

Decoupled Prompt-Adapter Tuning for Continual Activity Recognition

TL;DR

Abstract

Decoupled Prompt-Adapter Tuning for Continual Activity Recognition

Authors

TL;DR

Abstract

Table of Contents

Figures (2)