Table of Contents
Fetching ...

Multi-label Image Classification using Adaptive Graph Convolutional Networks: from a Single Domain to Multiple Domains

Indel Pal Singh, Enjie Ghorbel, Oyebade Oyedotun, Djamila Aouada

TL;DR

This work introduces Multi-Label Adaptive Graph Convolutional Network (ML-AGCN), which replaces heuristically predefined label graphs with end-to-end learned adjacency matrices that capture both edge importance (attention-based) and preserved feature similarity (cosine-based). The approach yields improved multi-label image classification (single-domain) and extends to unsupervised domain adaptation (DA-AGCN) via adversarial training to align source and target domains. Empirical results on MS-COCO, VG-500, and VOC demonstrate competitive mAP with substantially smaller model sizes, and the DA extension shows strong gains across aerial and cross-domain benchmarks, often outperforming state-of-the-art baselines. The combination of adaptive graph topology learning and domain-alignment mechanisms provides a compact yet effective framework for MLIC across domains, with the potential for open-set extensions in future work.

Abstract

This paper proposes an adaptive graph-based approach for multi-label image classification. Graph-based methods have been largely exploited in the field of multi-label classification, given their ability to model label correlations. Specifically, their effectiveness has been proven not only when considering a single domain but also when taking into account multiple domains. However, the topology of the used graph is not optimal as it is pre-defined heuristically. In addition, consecutive Graph Convolutional Network (GCN) aggregations tend to destroy the feature similarity. To overcome these issues, an architecture for learning the graph connectivity in an end-to-end fashion is introduced. This is done by integrating an attention-based mechanism and a similarity-preserving strategy. The proposed framework is then extended to multiple domains using an adversarial training scheme. Numerous experiments are reported on well-known single-domain and multi-domain benchmarks. The results demonstrate that our approach achieves competitive results in terms of mean Average Precision (mAP) and model size as compared to the state-of-the-art. The code will be made publicly available.

Multi-label Image Classification using Adaptive Graph Convolutional Networks: from a Single Domain to Multiple Domains

TL;DR

This work introduces Multi-Label Adaptive Graph Convolutional Network (ML-AGCN), which replaces heuristically predefined label graphs with end-to-end learned adjacency matrices that capture both edge importance (attention-based) and preserved feature similarity (cosine-based). The approach yields improved multi-label image classification (single-domain) and extends to unsupervised domain adaptation (DA-AGCN) via adversarial training to align source and target domains. Empirical results on MS-COCO, VG-500, and VOC demonstrate competitive mAP with substantially smaller model sizes, and the DA extension shows strong gains across aerial and cross-domain benchmarks, often outperforming state-of-the-art baselines. The combination of adaptive graph topology learning and domain-alignment mechanisms provides a compact yet effective framework for MLIC across domains, with the potential for open-set extensions in future work.

Abstract

This paper proposes an adaptive graph-based approach for multi-label image classification. Graph-based methods have been largely exploited in the field of multi-label classification, given their ability to model label correlations. Specifically, their effectiveness has been proven not only when considering a single domain but also when taking into account multiple domains. However, the topology of the used graph is not optimal as it is pre-defined heuristically. In addition, consecutive Graph Convolutional Network (GCN) aggregations tend to destroy the feature similarity. To overcome these issues, an architecture for learning the graph connectivity in an end-to-end fashion is introduced. This is done by integrating an attention-based mechanism and a similarity-preserving strategy. The proposed framework is then extended to multiple domains using an adversarial training scheme. Numerous experiments are reported on well-known single-domain and multi-domain benchmarks. The results demonstrate that our approach achieves competitive results in terms of mean Average Precision (mAP) and model size as compared to the state-of-the-art. The code will be made publicly available.
Paper Structure (43 sections, 16 equations, 8 figures, 11 tables)

This paper contains 43 sections, 16 equations, 8 figures, 11 tables.

Figures (8)

  • Figure 1: Comparison of our approach (ML-AGCN) without (top) and with UDA (down) to recent state-of-the-art methods in terms of number of parameters (millions) and mean Average Precision (mAP) on MS-COCO and Clipart $\rightarrow$ VOC. The considered state-of-the art methods are: MlTr-m mltr, TResNet-L tresnet, ML-Decoder ml-decoder, ML-GCN ml-gcn, ResNet101 resnet, DA-MAIC da-maic, and DANN dann.
  • Figure 2: Architecture of ML-AGCN ml-agcn: On the one hand, the CNN subnet learns relevant image features from an input image. On the other hand, the GCN subnet estimates interdependent label classifiers by taking into account one fixed adjacency matrix $\mathbf A$ and two adaptive adjacency matrices $\mathbf B^{(l)}$ and $\mathbf C^{(l)}$. Finally, the classifiers are applied to the CNN features for predicting the labels.
  • Figure 3: (a) An example of a fixed label graph with a threshold set to $\tau=0.1$ml-agcn. Dashed (red) edges indicate the ignored edges; (b) The proposed parameterized graph topology considering all the edges.
  • Figure 4: Comparison of node feature similarity: The top row presents a tSNE visualization, while the bottom row illustrates cosine-similarity map between the graph nodes for VOC dataset: a) using the original image-based embeddings (before GCN), b) after applying two layers of standard GCN using the proposed architecture in ML-GCN ml-gcn, and c) after applying two layers of AGCN using our approach (i.e., ML-AGCN).
  • Figure 5: Architecture of the proposed DA-AGCN for multi-label image classification (best viewed in color). Images from both source and target datasets are given as input to the CNN subnet that generates image features. The AGCN-subnet, similar to ML-AGCN ml-agcn, learns in an end-to-end manner the attention and similarity-based adjacency matrices $\mathbf B^{(l)}$ and $\mathbf C^{(l)}$, respectively, and generates accordingly inter-dependent label classifiers using only labeled source images. In addition, a domain classifier is considered.
  • ...and 3 more figures