Table of Contents
Fetching ...

Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs

Rong Ma, Jie Chen, Xiangyang Xue, Jian Pu

TL;DR

The paper tackles label-space conflicts in multi-dataset semantic segmentation by learning a unified label space through graph neural networks. It introduces a three-module framework: text-based label encoding, a GraphSAGE-based GNN that learns a unified embedding $\mathbf{X}_u$ and a learnable adjacency $\mathbf{M}_a$, and a UniSegHead that performs segmentation in the unified space; dataset-specific mappings $M_i$ translate predictions back to each dataset. Training alternates between refining the label space with the GNNs and optimizing the segmentation network, using an unbalanced optimal transport step to obtain discrete mappings and a cross-entropy loss in the dataset spaces. The approach yields state-of-the-art results on WildDash 2 and demonstrates strong generalization to unseen domains, while reducing the need for manual taxonomy curation and reannotation. Overall, the work enables robust, scalable, cross-dataset semantic segmentation with improved cross-domain knowledge transfer.

Abstract

Deep supervised models possess significant capability to assimilate extensive training data, thereby presenting an opportunity to enhance model performance through training on multiple datasets. However, conflicts arising from different label spaces among datasets may adversely affect model performance. In this paper, we propose a novel approach to automatically construct a unified label space across multiple datasets using graph neural networks. This enables semantic segmentation models to be trained simultaneously on multiple datasets, resulting in performance improvements. Unlike existing methods, our approach facilitates seamless training without the need for additional manual reannotation or taxonomy reconciliation. This significantly enhances the efficiency and effectiveness of multi-dataset segmentation model training. The results demonstrate that our method significantly outperforms other multi-dataset training methods when trained on seven datasets simultaneously, and achieves state-of-the-art performance on the WildDash 2 benchmark.

Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs

TL;DR

The paper tackles label-space conflicts in multi-dataset semantic segmentation by learning a unified label space through graph neural networks. It introduces a three-module framework: text-based label encoding, a GraphSAGE-based GNN that learns a unified embedding and a learnable adjacency , and a UniSegHead that performs segmentation in the unified space; dataset-specific mappings translate predictions back to each dataset. Training alternates between refining the label space with the GNNs and optimizing the segmentation network, using an unbalanced optimal transport step to obtain discrete mappings and a cross-entropy loss in the dataset spaces. The approach yields state-of-the-art results on WildDash 2 and demonstrates strong generalization to unseen domains, while reducing the need for manual taxonomy curation and reannotation. Overall, the work enables robust, scalable, cross-dataset semantic segmentation with improved cross-domain knowledge transfer.

Abstract

Deep supervised models possess significant capability to assimilate extensive training data, thereby presenting an opportunity to enhance model performance through training on multiple datasets. However, conflicts arising from different label spaces among datasets may adversely affect model performance. In this paper, we propose a novel approach to automatically construct a unified label space across multiple datasets using graph neural networks. This enables semantic segmentation models to be trained simultaneously on multiple datasets, resulting in performance improvements. Unlike existing methods, our approach facilitates seamless training without the need for additional manual reannotation or taxonomy reconciliation. This significantly enhances the efficiency and effectiveness of multi-dataset segmentation model training. The results demonstrate that our method significantly outperforms other multi-dataset training methods when trained on seven datasets simultaneously, and achieves state-of-the-art performance on the WildDash 2 benchmark.
Paper Structure (20 sections, 10 equations, 9 figures, 14 tables, 2 algorithms)

This paper contains 20 sections, 10 equations, 9 figures, 14 tables, 2 algorithms.

Figures (9)

  • Figure 1: Our method consists of three modules. The label encoding provides the semantic text features of the dataset labels. The GNNs learn the unified label embedding space and dataset label mappings based on the textual features and input images. The segmentation network leverages the unified label embedding space to produce segmentation results in the unified label space.
  • Figure 2: Illustration of our method that training with dataset-specific annotations through label mappings constructed by GNNs. We leverage a unified segmentation head (UniSegHead) to enable simultaneous training on multiple datasets. In the UniSegHead, we compute the matrix product between pixel embedding and augmented unified node features output by the GNNs, resulting in predictions for the unified label space. We finally utilize the label mappings constructed by GNNs to map the unified predictions to dataset-specific prediction for training.
  • Figure 3: Visual comparisons with Single dataset model on different training datasets.
  • Figure 4: The composition of the training datasets.
  • Figure 5: Comparison of unified label space learned by GNNs with constructed by text features.
  • ...and 4 more figures