Table of Contents
Fetching ...

Not Just Object, But State: Compositional Incremental Learning without Forgetting

Yanyi Zhang, Binglin Qiu, Qi Jia, Yu Liu, Ran He

TL;DR

This work introduces Compositional Incremental Learning (composition-IL), a setting where models continually acquire fine-grained state-object knowledge without forgetting. It proposes CompILer, a rehearsal-free learner that uses multi-pool prompts to model states, objects, and their compositions, augmented by object-injected state prompting and generalized-mean prompt fusion. By reorganizing Clothing16K IVR and UT-Zappos50K into Split-Clothing and Split-UT-Zappos, the authors show that CompILer achieves state-of-the-art Avg Acc and favorable HM while maintaining robustness to noisy labels through symmetric cross-entropy. The approach advances fine-grained compositional reasoning in open-ended incremental learning with practical implications for visual understanding of object attributes across time and domains.

Abstract

Most incremental learners excessively prioritize coarse classes of objects while neglecting various kinds of states (e.g. color and material) attached to the objects. As a result, they are limited in the ability to reason fine-grained compositionality of state-object pairs. To remedy this limitation, we propose a novel task called Compositional Incremental Learning (composition-IL), enabling the model to recognize state-object compositions as a whole in an incremental learning fashion. Since the lack of suitable benchmarks, we re-organize two existing datasets and make them tailored for composition-IL. Then, we propose a prompt-based Composition Incremental Learner (CompILer), to overcome the ambiguous composition boundary problem which challenges composition-IL largely. Specifically, we exploit multi-pool prompt learning, which is regularized by inter-pool prompt discrepancy and intra-pool prompt diversity. Besides, we devise object-injected state prompting by using object prompts to guide the selection of state prompts. Furthermore, we fuse the selected prompts by a generalized-mean strategy, to eliminate irrelevant information learned in the prompts. Extensive experiments on two datasets exhibit state-of-the-art performance achieved by CompILer.

Not Just Object, But State: Compositional Incremental Learning without Forgetting

TL;DR

This work introduces Compositional Incremental Learning (composition-IL), a setting where models continually acquire fine-grained state-object knowledge without forgetting. It proposes CompILer, a rehearsal-free learner that uses multi-pool prompts to model states, objects, and their compositions, augmented by object-injected state prompting and generalized-mean prompt fusion. By reorganizing Clothing16K IVR and UT-Zappos50K into Split-Clothing and Split-UT-Zappos, the authors show that CompILer achieves state-of-the-art Avg Acc and favorable HM while maintaining robustness to noisy labels through symmetric cross-entropy. The approach advances fine-grained compositional reasoning in open-ended incremental learning with practical implications for visual understanding of object attributes across time and domains.

Abstract

Most incremental learners excessively prioritize coarse classes of objects while neglecting various kinds of states (e.g. color and material) attached to the objects. As a result, they are limited in the ability to reason fine-grained compositionality of state-object pairs. To remedy this limitation, we propose a novel task called Compositional Incremental Learning (composition-IL), enabling the model to recognize state-object compositions as a whole in an incremental learning fashion. Since the lack of suitable benchmarks, we re-organize two existing datasets and make them tailored for composition-IL. Then, we propose a prompt-based Composition Incremental Learner (CompILer), to overcome the ambiguous composition boundary problem which challenges composition-IL largely. Specifically, we exploit multi-pool prompt learning, which is regularized by inter-pool prompt discrepancy and intra-pool prompt diversity. Besides, we devise object-injected state prompting by using object prompts to guide the selection of state prompts. Furthermore, we fuse the selected prompts by a generalized-mean strategy, to eliminate irrelevant information learned in the prompts. Extensive experiments on two datasets exhibit state-of-the-art performance achieved by CompILer.

Paper Structure

This paper contains 20 sections, 10 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Differences between Class Incremental Learning (class-IL), Blurry Incremental Learning (blur-IL), and Compositional Incremental Learning (composition-IL). The object classes are not allowed to recur in the class-IL scenario, whereas they may recur randomly in the blur-IL scenario. Different from them, the classes in composition-IL involve state-object compositions apart from the object classes. Besides, the compositions do not reoccur, but the primitives (states or objects) may randomly reappear across incremental sessions.
  • Figure 2: Data Statistics of Split-Clothing and Split-UT-Zappos for tasking composition-IL. Split-Clothing is divided into a 5-task scenario, while Split-UT-Zappos includes both 5-task and 10-task scenarios. In all settings, the number of images per task has been balanced properly.
  • Figure 3: t-SNE feature distributions of seven compositions from the Split-Clothing benchmark. For the compositions with the same object but with different states, our CompILer achieves more distinguishable boundaries than the L2P baseline.
  • Figure 4: Overall architecture of our composition incremental learner (CompILer), which comprises multi-pool prompt learning, object-injected state prompting, and generalized-mean prompt fusion. The multi-pool prompt learning mechanism captures information related to states, objects, and their compositions, each through a dedicated pool. The object-injected state prompting utilizes the object prompt to promote the state representation learning. Moreover, the generalized-mean prompt fusion is used to prioritize the useful prompts and diminish the irrelevant ones.
  • Figure 5: Architecture of object-injected state prompting. Query feature serves as Q, while fused object prompt serves as both K and V.
  • ...and 1 more figures