Table of Contents
Fetching ...

Temporal Imbalance of Positive and Negative Supervision in Class-Incremental Learning

Jinge Ma, Fengqing Zhu

TL;DR

A temporal supervision model is established, temporal imbalance is formally defined, and Temporal-Adjusted Loss (TAL), which uses a temporal decay kernel to construct a supervision strength vector and dynamically reweight the negative supervision in cross-entropy loss is proposed.

Abstract

With the widespread adoption of deep learning in visual tasks, Class-Incremental Learning (CIL) has become an important paradigm for handling dynamically evolving data distributions. However, CIL faces the core challenge of catastrophic forgetting, often manifested as a prediction bias toward new classes. Existing methods mainly attribute this bias to intra-task class imbalance and focus on corrections at the classifier head. In this paper, we highlight an overlooked factor -- temporal imbalance -- as a key cause of this bias. Earlier classes receive stronger negative supervision toward the end of training, leading to asymmetric precision and recall. We establish a temporal supervision model, formally define temporal imbalance, and propose Temporal-Adjusted Loss (TAL), which uses a temporal decay kernel to construct a supervision strength vector and dynamically reweight the negative supervision in cross-entropy loss. Theoretical analysis shows that TAL degenerates to standard cross-entropy under balanced conditions and effectively mitigates prediction bias under imbalance. Extensive experiments demonstrate that TAL significantly reduces forgetting and improves performance on multiple CIL benchmarks, underscoring the importance of temporal modeling for stable long-term learning.

Temporal Imbalance of Positive and Negative Supervision in Class-Incremental Learning

TL;DR

A temporal supervision model is established, temporal imbalance is formally defined, and Temporal-Adjusted Loss (TAL), which uses a temporal decay kernel to construct a supervision strength vector and dynamically reweight the negative supervision in cross-entropy loss is proposed.

Abstract

With the widespread adoption of deep learning in visual tasks, Class-Incremental Learning (CIL) has become an important paradigm for handling dynamically evolving data distributions. However, CIL faces the core challenge of catastrophic forgetting, often manifested as a prediction bias toward new classes. Existing methods mainly attribute this bias to intra-task class imbalance and focus on corrections at the classifier head. In this paper, we highlight an overlooked factor -- temporal imbalance -- as a key cause of this bias. Earlier classes receive stronger negative supervision toward the end of training, leading to asymmetric precision and recall. We establish a temporal supervision model, formally define temporal imbalance, and propose Temporal-Adjusted Loss (TAL), which uses a temporal decay kernel to construct a supervision strength vector and dynamically reweight the negative supervision in cross-entropy loss. Theoretical analysis shows that TAL degenerates to standard cross-entropy under balanced conditions and effectively mitigates prediction bias under imbalance. Extensive experiments demonstrate that TAL significantly reduces forgetting and improves performance on multiple CIL benchmarks, underscoring the importance of temporal modeling for stable long-term learning.
Paper Structure (58 sections, 2 theorems, 50 equations, 8 figures, 6 tables)

This paper contains 58 sections, 2 theorems, 50 equations, 8 figures, 6 tables.

Key Result

Theorem 1

Consider two classes $A$ and $B$ with the same total number of positive samples. Since $f[n]$ is monotonically decreasing, it follows that with strict inequality if $\exists\,n$ such that $S_A[n]>S_B[n]$ and $f[n]$ is strictly decreasing.

Figures (8)

  • Figure 1: Illustration of temporal imbalance, which differs from class imbalance. In Task 2, although old classes A and B have the same number of training samples, the earlier old class suffers from more severe forgetting than the later one.
  • Figure 2: (a) An example of Temporal Imbalance: it illustrates, in the CIL setting, how the cumulative number of positive samples $S[n]$ for two classes evolves as the number of training batches $n$ increases. Since the positive samples of class 11 appear earlier than those of class 45, the result shows that $S_{11}[n] \ge S_{45}[n]$. (b)&(c) Effects of Temporal Imbalance:(b) Shows, at the end of training, the correlation between the Precision/Recall of each class and its $Q$ value when ER is trained with the cross-entropy loss. For clearer visualization, the normalized value $\tilde{Q} = Q - Q_{\min}$ is used, where $Q_{\min}$ denotes the minimum $Q$ across all classes. (c) Demonstrates that, at the end of training, the Precision–Recall asymmetry induced by temporal imbalance is a common phenomenon across different CIL methods. Note that all classes shown in the figure are old classes, so there is no class imbalance in the final task.
  • Figure 3: This figure shows the performance of different baselines on each class when Task id = 3 under the 10-task setting on ImageNet-100. It can be observed that earlier classes exhibit lower average class accuracy (i.e., recall), similar to Fig \ref{['fig:functions']}(c). TAL reduces the sensitivity of old classes to negative supervision while increasing that of new classes. As a result, TAL improves the recall of most old classes at the cost of lowering the recall of new classes slightly.
  • Figure 4: Forgetting curves with representative CIL baselines and datasets.
  • Figure 5: This figure illustrates the impact of applying TAL on the feature space of iCaRL. We visualize the distribution of test images from several new and old classes in the ResNet18 feature space on ImageNet100 at Task ID = 3.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1: Temporal imbalance under equal sample counts
  • Proposition 1: Invariance of $[0,Q_{\max})$ and $[0,1)$