Leveraging Adaptive Implicit Representation Mapping for Ultra High-Resolution Image Segmentation
Ziyu Zhao, Xiaoguang Li, Pingping Cai, Canyu Zhang, Song Wang
TL;DR
The paper tackles ultra-high-resolution image segmentation by identifying two core shortcomings of existing IRM-based refinement: limited global semantics due to CNN encoders and the lack of generalization in a shared mapping. It proposes Adaptive Implicit Representation Mapping (AIRM), consisting of an Affinity Empowered Encoder and Adaptive Implicit Representation Mapping Function, leveraging a transformer-based encoder and a hypernetwork to produce adaptive mapping parameters guided by global context. Through extensive experiments on BIG and relabeled PASCAL VOC 2012, AIRM consistently outperforms competitive IRM-based refinements in IoU and boundary accuracy, demonstrating the value of large receptive fields and adaptive, affinity-informed feature translation. This work offers a practical pathway to more accurate and scalable ultra-high-resolution segmentation with potential impact on high-resolution imagery analytics and downstream tasks that rely on precise mask delineation.
Abstract
Implicit representation mapping (IRM) can translate image features to any continuous resolution, showcasing its potent capability for ultra-high-resolution image segmentation refinement. Current IRM-based methods for refining ultra-high-resolution image segmentation often rely on CNN-based encoders to extract image features and apply a Shared Implicit Representation Mapping Function (SIRMF) to convert pixel-wise features into segmented results. Hence, these methods exhibit two crucial limitations. Firstly, the CNN-based encoder may not effectively capture long-distance information, resulting in a lack of global semantic information in the pixel-wise features. Secondly, SIRMF is shared across all samples, which limits its ability to generalize and handle diverse inputs. To address these limitations, we propose a novel approach that leverages the newly proposed Adaptive Implicit Representation Mapping (AIRM) for ultra-high-resolution Image Segmentation. Specifically, the proposed method comprises two components: (1) the Affinity Empowered Encoder (AEE), a robust feature extractor that leverages the benefits of the transformer architecture and semantic affinity to model long-distance features effectively, and (2) the Adaptive Implicit Representation Mapping Function (AIRMF), which adaptively translates pixel-wise features without neglecting the global semantic information, allowing for flexible and precise feature translation. We evaluated our method on the commonly used ultra-high-resolution segmentation refinement datasets, i.e., BIG and PASCAL VOC 2012. The extensive experiments demonstrate that our method outperforms competitors by a large margin. The code is provided in supplementary material.
