Hierarchical Semantic Tree Anchoring for CLIP-Based Class-Incremental Learning
Tao Hu, Lan Li, Zhen-Hao Xie, Da-Wei Zhou
TL;DR
HASTEN tackles catastrophic forgetting in CLIP-based class-incremental learning by injecting explicit hierarchical structure into a hyperbolic feature space. It constructs a GPT-5–driven semantic tree, learns per-task hierarchy-aware projections, and maps features into a shared hyperbolic space with a global mapper, while protecting past mappings via null-space gradient projection. Hierarchy-aware entailment constraints and a hyperbolic contrastive objective stabilize cross-modal alignment, and virtual-class anchoring preserves past structure without exemplars. Empirical results across nine benchmarks show strong, consistent improvements over prior methods, with robustness to seeds, backbones, and different LLMs for tree generation. The approach offers a principled way to fuse hierarchical semantics with continual learning in vision-language models, enabling scalable, structure-preserving incremental updates.
Abstract
Class-Incremental Learning (CIL) enables models to learn new classes continually while preserving past knowledge. Recently, vision-language models like CLIP offer transferable features via multi-modal pre-training, making them well-suited for CIL. However, real-world visual and linguistic concepts are inherently hierarchical: a textual concept like "dog" subsumes fine-grained categories such as "Labrador" and "Golden Retriever," and each category entails its images. But existing CLIP-based CIL methods fail to explicitly capture this inherent hierarchy, leading to fine-grained class features drift during incremental updates and ultimately to catastrophic forgetting. To address this challenge, we propose HASTEN (Hierarchical Semantic Tree Anchoring) that anchors hierarchical information into CIL to reduce catastrophic forgetting. First, we employ an external knowledge graph as supervision to embed visual and textual features in hyperbolic space, effectively preserving hierarchical structure as data evolves. Second, to mitigate catastrophic forgetting, we project gradients onto the null space of the shared hyperbolic mapper, preventing interference with prior tasks. These two steps work synergistically to enable the model to resist forgetting by maintaining hierarchical relationships. Extensive experiments show that HASTEN consistently outperforms existing methods while providing a unified structured representation.
