Optimizing against Infeasible Inclusions from Data for Semantic Segmentation through Morphology
Shamik Basu, Luc Van Gool, Christos Sakaridis
TL;DR
InSeIn addresses the problem of infeasible high-level spatial relations in semantic segmentation by data-drivenly extracting feasible and infeasible class inclusions from training data and enforcing feasibility with a differentiable morphological loss. The method adds a novel inclusion loss to standard cross-entropy, computed via a differentiable area-opening procedure on softmax score maps, and is used as a plug‑in to various state-of-the-art networks. Empirically, InSeIn yields consistent mIoU improvements across Cityscapes, ADE20K, and ACDC, while substantially reducing infeasible inclusions as measured by the mINF metric and lowering false response errors. The approach is lightweight (training only) and operates without learned parameters beyond the loss weight, promising practical impact for robust semantic segmentation under domain shift and complex scene layouts.
Abstract
State-of-the-art semantic segmentation models are typically optimized in a data-driven fashion, minimizing solely per-pixel or per-segment classification objectives on their training data. This purely data-driven paradigm often leads to absurd segmentations, especially when the domain of input images is shifted from the one encountered during training. For instance, state-of-the-art models may assign the label "road" to a segment that is included by another segment that is respectively labeled as "sky". However, the ground truth of the existing dataset at hand dictates that such inclusion is not feasible. Our method, Infeasible Semantic Inclusions (InSeIn), first extracts explicit inclusion constraints that govern spatial class relations from the semantic segmentation training set at hand in an offline, data-driven fashion, and then enforces a morphological yet differentiable loss that penalizes violations of these constraints during training to promote prediction feasibility. InSeIn is a light-weight plug-and-play method, constitutes a novel step towards minimizing infeasible semantic inclusions in the predictions of learned segmentation models, and yields consistent and significant performance improvements over diverse state-of-the-art networks across the ADE20K, Cityscapes, and ACDC datasets. https://github.com/SHAMIK-97/InSeIn
