Table of Contents
Fetching ...

Continual Gesture Learning without Data via Synthetic Feature Sampling

Zhenyu Lu, Hao Tang

TL;DR

This work tackles data-free continual learning for skeleton-based gesture recognition by revealing that base-trained skeleton encoders generalize well to unseen classes. It introduces Synthetic Feature Replay, which samples synthetic features from per-class Gaussian prototypes in the embedding space to replay old classes and augment new ones, avoiding data synthesis. The approach achieves up to large gains over state-of-the-art on skeleton gesture benchmarks, with strong improvements in mean accuracy and reduced INCREMENTAL FORGETTING MEASURE, while offering computational efficiency and privacy benefits. The results support the practicality of embedding-space replay for data-free continual learning in gesture-based interfaces, particularly on edge devices in AR/VR contexts.

Abstract

Data-Free Class Incremental Learning (DFCIL) aims to enable models to continuously learn new classes while retraining knowledge of old classes, even when the training data for old classes is unavailable. Although explored primarily with image datasets by researchers, this study focuses on investigating DFCIL for skeleton-based gesture classification due to its significant real-world implications, particularly considering the growing prevalence of VR/AR headsets where gestures serve as the primary means of control and interaction. In this work, we made an intriguing observation: skeleton models trained with base classes(even very limited) demonstrate strong generalization capabilities to unseen classes without requiring additional training. Building on this insight, we developed Synthetic Feature Replay (SFR) that can sample synthetic features from class prototypes to replay for old classes and augment for new classes (under a few-shot setting). Our proposed method showcases significant advancements over the state-of-the-art, achieving up to 15% enhancements in mean accuracy across all steps and largely mitigating the accuracy imbalance between base classes and new classes.

Continual Gesture Learning without Data via Synthetic Feature Sampling

TL;DR

This work tackles data-free continual learning for skeleton-based gesture recognition by revealing that base-trained skeleton encoders generalize well to unseen classes. It introduces Synthetic Feature Replay, which samples synthetic features from per-class Gaussian prototypes in the embedding space to replay old classes and augment new ones, avoiding data synthesis. The approach achieves up to large gains over state-of-the-art on skeleton gesture benchmarks, with strong improvements in mean accuracy and reduced INCREMENTAL FORGETTING MEASURE, while offering computational efficiency and privacy benefits. The results support the practicality of embedding-space replay for data-free continual learning in gesture-based interfaces, particularly on edge devices in AR/VR contexts.

Abstract

Data-Free Class Incremental Learning (DFCIL) aims to enable models to continuously learn new classes while retraining knowledge of old classes, even when the training data for old classes is unavailable. Although explored primarily with image datasets by researchers, this study focuses on investigating DFCIL for skeleton-based gesture classification due to its significant real-world implications, particularly considering the growing prevalence of VR/AR headsets where gestures serve as the primary means of control and interaction. In this work, we made an intriguing observation: skeleton models trained with base classes(even very limited) demonstrate strong generalization capabilities to unseen classes without requiring additional training. Building on this insight, we developed Synthetic Feature Replay (SFR) that can sample synthetic features from class prototypes to replay for old classes and augment for new classes (under a few-shot setting). Our proposed method showcases significant advancements over the state-of-the-art, achieving up to 15% enhancements in mean accuracy across all steps and largely mitigating the accuracy imbalance between base classes and new classes.
Paper Structure (24 sections, 3 equations, 7 figures, 4 tables, 2 algorithms)

This paper contains 24 sections, 3 equations, 7 figures, 4 tables, 2 algorithms.

Figures (7)

  • Figure 1: Overall Pipeline. (1) Compute and save class prototypes from the embedding space. (2) Sample synthetic old class features from the saved class prototype using Alg. \ref{['algo:Replay']}. (3) Only in the few shot setting, sample synthetic new class feature using Alg. \ref{['algo:Augmentation']} to augment the new class data. (4) Combine new and old class features together to train the classifier.
  • Figure 2: A overall comparison between DFCIL on image datasets (Left) and skeleton datasets (Right), shows the performance on skeleton datasets generally superior to that on image datasets (We use our proposed approach to generate experiment result for skeleton datasets and use TEEN prototype_calibration for image datasets as it is the SOTA method for image datasets). Session Accuracy Delta (SAD) measures the gap between the accuracy of the base session and the accuracy of the final session after the model has processed all data. IFM (last session) measures the Instantaneous Forgetting Measure of the last session. For both metrics, lower is better.
  • Figure 3: A toy experiment on Shrec-2017, three new hand gesture classes were added to a model pre-trained on eight base classes. A similar experiment was conducted on an image dataset for comparison, and the details can be found in the appendix.
  • Figure 4: The impact of replay buffer size on performance when evaluating on Shrec-2017. Detailed results can be found in appendix.
  • Figure 5: SOTA comparison of global accuracy and IFM for all incremental sessions.
  • ...and 2 more figures