Improving Multi-label Recognition using Class Co-Occurrence Probabilities
Samyak Rawlekar, Shubhang Bhatnagar, Vishnuvardhan Pogunulu Srinivasulu, Narendra Ahuja
TL;DR
This work tackles multi-label recognition under limited labeled data by exploiting object co-occurrence statistics. It introduces a two-stage approach: first, VLM-driven, prompt-based logits provide initial evidence; second, a Graph Convolutional Network refines these logits using a conditional probability prior $A$ derived from training co-occurrences, where $a_{mn} = c_{mn}/c_{mm}$. Training employs Reweighted Asymmetric Loss (RASL) to address long-tailed class distributions. Empirical results on four benchmarks in the low-data regime show consistent, substantial improvements over state-of-the-art methods, particularly for difficult-to-recognize classes, validating the value of inter-class dependencies for MLR.
Abstract
Multi-label Recognition (MLR) involves the identification of multiple objects within an image. To address the additional complexity of this problem, recent works have leveraged information from vision-language models (VLMs) trained on large text-images datasets for the task. These methods learn an independent classifier for each object (class), overlooking correlations in their occurrences. Such co-occurrences can be captured from the training data as conditional probabilities between a pair of classes. We propose a framework to extend the independent classifiers by incorporating the co-occurrence information for object pairs to improve the performance of independent classifiers. We use a Graph Convolutional Network (GCN) to enforce the conditional probabilities between classes, by refining the initial estimates derived from image and text sources obtained using VLMs. We validate our method on four MLR datasets, where our approach outperforms all state-of-the-art methods.
