Automated Label Unification for Multi-Dataset Semantic Segmentation with GNNs
Rong Ma, Jie Chen, Xiangyang Xue, Jian Pu
TL;DR
The paper tackles label-space conflicts in multi-dataset semantic segmentation by learning a unified label space through graph neural networks. It introduces a three-module framework: text-based label encoding, a GraphSAGE-based GNN that learns a unified embedding $\mathbf{X}_u$ and a learnable adjacency $\mathbf{M}_a$, and a UniSegHead that performs segmentation in the unified space; dataset-specific mappings $M_i$ translate predictions back to each dataset. Training alternates between refining the label space with the GNNs and optimizing the segmentation network, using an unbalanced optimal transport step to obtain discrete mappings and a cross-entropy loss in the dataset spaces. The approach yields state-of-the-art results on WildDash 2 and demonstrates strong generalization to unseen domains, while reducing the need for manual taxonomy curation and reannotation. Overall, the work enables robust, scalable, cross-dataset semantic segmentation with improved cross-domain knowledge transfer.
Abstract
Deep supervised models possess significant capability to assimilate extensive training data, thereby presenting an opportunity to enhance model performance through training on multiple datasets. However, conflicts arising from different label spaces among datasets may adversely affect model performance. In this paper, we propose a novel approach to automatically construct a unified label space across multiple datasets using graph neural networks. This enables semantic segmentation models to be trained simultaneously on multiple datasets, resulting in performance improvements. Unlike existing methods, our approach facilitates seamless training without the need for additional manual reannotation or taxonomy reconciliation. This significantly enhances the efficiency and effectiveness of multi-dataset segmentation model training. The results demonstrate that our method significantly outperforms other multi-dataset training methods when trained on seven datasets simultaneously, and achieves state-of-the-art performance on the WildDash 2 benchmark.
