FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding
Thanh-Dat Truong, Utsav Prabhu, Bhiksha Raj, Jackson Cothren, Khoa Luu
TL;DR
FALCON tackles the fairness and unknown-class modeling challenges in continual semantic segmentation by introducing a Fairness Contrastive Clustering Loss and an Attention-based Visual Grammar for unknown classes. The method couples a contrastive clustering objective with a learnable fairness mechanism, while an upcoming visual grammar module models the distribution of unknown classes through self-attention, enabling discriminative representations across both known and unseen classes. The approach yields state-of-the-art results on ADE20K, Pascal VOC, and Cityscapes, with empirical evidence showing improved fairness for minor classes and robust forgetting control across sequential tasks. By connecting the contrastive clustering objective to an upper bound on knowledge distillation, FALCON provides a principled, scalable way to preserve previous knowledge while learning new classes in open-set environments, with broad implications for fair, continual semantic understanding.
Abstract
Continual Learning in semantic scene segmentation aims to continually learn new unseen classes in dynamic environments while maintaining previously learned knowledge. Prior studies focused on modeling the catastrophic forgetting and background shift challenges in continual learning. However, fairness, another major challenge that causes unfair predictions leading to low performance among major and minor classes, still needs to be well addressed. In addition, prior methods have yet to model the unknown classes well, thus resulting in producing non-discriminative features among unknown classes. This work presents a novel Fairness Learning via Contrastive Attention Approach to continual learning in semantic scene understanding. In particular, we first introduce a new Fairness Contrastive Clustering loss to address the problems of catastrophic forgetting and fairness. Then, we propose an attention-based visual grammar approach to effectively model the background shift problem and unknown classes, producing better feature representations for different unknown classes. Through our experiments, our proposed approach achieves State-of-the-Art (SoTA) performance on different continual learning benchmarks, i.e., ADE20K, Cityscapes, and Pascal VOC. It promotes the fairness of the continual semantic segmentation model.
