Table of Contents
Fetching ...

Gated Adaptation for Continual Learning in Human Activity Recognition

Reza Rahimi Azghan, Gautham Krishna Gudur, Mohit Malu, Edison Thomaz, Giulia Pedrielli, Pavan Turaga, Hassan Ghasemzadeh

TL;DR

This work proposes a parameter-efficient continual learning framework based on channel-wise gated modulation of frozen pretrained representations that matches or exceeds standard continual learning baselines without replay buffers or task-specific regularization, confirming that structured diagonal operators are effective and efficient under distribution shift.

Abstract

Wearable sensors in Internet of Things (IoT) ecosystems increasingly support applications such as remote health monitoring, elderly care, and smart home automation, all of which rely on robust human activity recognition (HAR). Continual learning systems must balance plasticity (learning new tasks) with stability (retaining prior knowledge), yet AI models often exhibit catastrophic forgetting, where learning new tasks degrades performance on earlier ones. This challenge is especially acute in domain-incremental HAR, where on-device models must adapt to new subjects with distinct movement patterns while maintaining accuracy on prior subjects without transmitting sensitive data to the cloud. We propose a parameter-efficient continual learning framework based on channel-wise gated modulation of frozen pretrained representations. Our key insight is that adaptation should operate through feature selection rather than feature generation: by restricting learned transformations to diagonal scaling of existing features, we preserve the geometry of pretrained representations while enabling subject-specific modulation. We provide a theoretical analysis showing that gating implements a bounded diagonal operator that limits representational drift compared to unconstrained linear transformations. Empirically, freezing the backbone substantially reduces forgetting, and lightweight gates restore lost adaptation capacity, achieving stability and plasticity simultaneously. On PAMAP2 with 8 sequential subjects, our approach reduces forgetting from 39.7% to 16.2% and improves final accuracy from 56.7% to 77.7%, while training less than 2% of parameters. Our method matches or exceeds standard continual learning baselines without replay buffers or task-specific regularization, confirming that structured diagonal operators are effective and efficient under distribution shift.

Gated Adaptation for Continual Learning in Human Activity Recognition

TL;DR

This work proposes a parameter-efficient continual learning framework based on channel-wise gated modulation of frozen pretrained representations that matches or exceeds standard continual learning baselines without replay buffers or task-specific regularization, confirming that structured diagonal operators are effective and efficient under distribution shift.

Abstract

Wearable sensors in Internet of Things (IoT) ecosystems increasingly support applications such as remote health monitoring, elderly care, and smart home automation, all of which rely on robust human activity recognition (HAR). Continual learning systems must balance plasticity (learning new tasks) with stability (retaining prior knowledge), yet AI models often exhibit catastrophic forgetting, where learning new tasks degrades performance on earlier ones. This challenge is especially acute in domain-incremental HAR, where on-device models must adapt to new subjects with distinct movement patterns while maintaining accuracy on prior subjects without transmitting sensitive data to the cloud. We propose a parameter-efficient continual learning framework based on channel-wise gated modulation of frozen pretrained representations. Our key insight is that adaptation should operate through feature selection rather than feature generation: by restricting learned transformations to diagonal scaling of existing features, we preserve the geometry of pretrained representations while enabling subject-specific modulation. We provide a theoretical analysis showing that gating implements a bounded diagonal operator that limits representational drift compared to unconstrained linear transformations. Empirically, freezing the backbone substantially reduces forgetting, and lightweight gates restore lost adaptation capacity, achieving stability and plasticity simultaneously. On PAMAP2 with 8 sequential subjects, our approach reduces forgetting from 39.7% to 16.2% and improves final accuracy from 56.7% to 77.7%, while training less than 2% of parameters. Our method matches or exceeds standard continual learning baselines without replay buffers or task-specific regularization, confirming that structured diagonal operators are effective and efficient under distribution shift.
Paper Structure (48 sections, 5 theorems, 34 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 48 sections, 5 theorems, 34 equations, 5 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

Let $U(\mathbf{x}) \in \mathbb{R}^{C \times d}$ denote the ungated feature map produced by a frozen backbone for an input $\mathbf{x}$. Let $g(\mathbf{x}), g'(\mathbf{x}) \in (0,1)^C$ be the channel-wise gate vectors before and after learning a new task, and define the corresponding gated representa where $D(\cdot)$ denotes a diagonal matrix. Then the feature drift satisfies, where $\delta(\mathb

Figures (5)

  • Figure 1: Catastrophic forgetting in subject-incremental HAR: accuracy on Subject 1 drops from 85% to 40% after training on just three additional subjects (PAMAP2 dataset).
  • Figure 2: Overview of the proposed continual learning system for HAR, comprising data collection, preprocessing, and a classification model with a frozen pretrained backbone modulated by lightweight trainable gates.
  • Figure 3: Channel-wise gating mechanism. Given an intermediate feature map $U_\ell$, global average pooling (squeeze) produces a channel descriptor $z_\ell$, which is transformed through bottleneck layers (excitation) to produce gate values $g_\ell \in (0,1)^{c_\ell}$. The gates then scale each channel independently, preserving feature directions while modulating magnitudes.
  • Figure 4: Cross-subject feature channel correlation matrix averaged over all subject pairs in PAMAP2. Entry $(c, c')$ shows the Pearson correlation between channel $c$ of subject $i$ and channel $c'$ of subject $j$, computed using activity-class centroids as paired observations. The strong diagonal (mean $\rho = 0.78$) indicates that channels preserve their identity across subjects, while the weak off-diagonal (mean $\rho \approx 0$) suggests minimal cross-channel mixing. This supports Assumption \ref{['ass:diagonal']}: cross-subject variation is predominantly channel-wise.
  • Figure 5: Per-subject accuracy evolution with our gating approach on PAMAP2. Unlike the baseline in Figure \ref{['fig:forgetting_intro']}, which dropped to 40% on Subject 1 after four subjects, our method retains 69.5% accuracy while achieving 93.8% learning accuracy on new subjects.

Theorems & Definitions (12)

  • Theorem 1: Bounded Feature Drift under Diagonal Gating
  • proof
  • Remark 1: Contrast with Unconstrained Adaptation
  • Theorem 2: Bounded Logit Drift
  • proof
  • Theorem 3: Sufficient Condition for Prediction Stability
  • proof
  • Corollary 1: Margin-Based Forgetting Guarantee
  • Remark 2: Empirical Support for Assumption \ref{['ass:diagonal']}
  • Theorem 4: Expressiveness of Diagonal Gating
  • ...and 2 more