Open Panoramic Segmentation
Junwei Zheng, Ruiping Liu, Yufan Chen, Kunyu Peng, Chengzhi Wu, Kailun Yang, Jiaming Zhang, Rainer Stiefelhagen
TL;DR
This work defines Open Panoramic Segmentation (OPS), enabling zero-shot, open-vocabulary segmentation on 360° panoramas while training on FoV-restricted pinhole data. It introduces OOOPS, a model that couples a frozen CLIP backbone with a Deformable Adapter Network (DAN) and introduces the Deformable Adapter Operator (DAO) to handle panorama distortions, augmented by Random Equirectangular Projection (RERP) to simulate distortion during training. Experiments on WildPASS, Stanford2D3D, and Matterport3D show that OOOPS with RERP achieves state-of-the-art gains in open panoramic segmentation, surpassing other open-vocabulary methods by up to ~2.4 percentage points in mIoU, while remaining competitive with some close-vocabulary panoramas. The approach advances practical, distortion-aware, zero-shot scene understanding in panoramic imagery and provides publicly available code for replication and extension.
Abstract
Panoramic images, capturing a 360° field of view (FoV), encompass omnidirectional spatial information crucial for scene understanding. However, it is not only costly to obtain training-sufficient dense-annotated panoramas but also application-restricted when training models in a close-vocabulary setting. To tackle this problem, in this work, we define a new task termed Open Panoramic Segmentation (OPS), where models are trained with FoV-restricted pinhole images in the source domain in an open-vocabulary setting while evaluated with FoV-open panoramic images in the target domain, enabling the zero-shot open panoramic semantic segmentation ability of models. Moreover, we propose a model named OOOPS with a Deformable Adapter Network (DAN), which significantly improves zero-shot panoramic semantic segmentation performance. To further enhance the distortion-aware modeling ability from the pinhole source domain, we propose a novel data augmentation method called Random Equirectangular Projection (RERP) which is specifically designed to address object deformations in advance. Surpassing other state-of-the-art open-vocabulary semantic segmentation approaches, a remarkable performance boost on three panoramic datasets, WildPASS, Stanford2D3D, and Matterport3D, proves the effectiveness of our proposed OOOPS model with RERP on the OPS task, especially +2.2% on outdoor WildPASS and +2.4% mIoU on indoor Stanford2D3D. The source code is publicly available at https://junweizheng93.github.io/publications/OPS/OPS.html.
