SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation

Xuewei Li; Tao Wu; Zhongang Qi; Gaoang Wang; Ying Shan; Xi Li

SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation

Xuewei Li, Tao Wu, Zhongang Qi, Gaoang Wang, Ying Shan, Xi Li

TL;DR

Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS significantly improves performance and robustness, with approximately a 2% increase in mIoU, and when small 3D disturbances occur in the data, the stability of the performance is improved by an order of magnitude.

Abstract

As an important and challenging problem in computer vision, PAnoramic Semantic Segmentation (PASS) gives complete scene perception based on an ultra-wide angle of view. Usually, prevalent PASS methods with 2D panoramic image input focus on solving image distortions but lack consideration of the 3D properties of original $360^{\circ}$ data. Therefore, their performance will drop a lot when inputting panoramic images with the 3D disturbance. To be more robust to 3D disturbance, we propose our Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation (SGAT4PASS), considering 3D spherical geometry knowledge. Specifically, a spherical geometry-aware framework is proposed for PASS. It includes three modules, i.e., spherical geometry-aware image projection, spherical deformable patch embedding, and a panorama-aware loss, which takes input images with 3D disturbance into account, adds a spherical geometry-aware constraint on the existing deformable patch embedding, and indicates the pixel density of original $360^{\circ}$ data, respectively. Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS significantly improves performance and robustness, with approximately a 2% increase in mIoU, and when small 3D disturbances occur in the data, the stability of our performance is improved by an order of magnitude. Our code and supplementary material are available at https://github.com/TencentARC/SGAT4PASS.

SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation

TL;DR

Abstract

data. Therefore, their performance will drop a lot when inputting panoramic images with the 3D disturbance. To be more robust to 3D disturbance, we propose our Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation (SGAT4PASS), considering 3D spherical geometry knowledge. Specifically, a spherical geometry-aware framework is proposed for PASS. It includes three modules, i.e., spherical geometry-aware image projection, spherical deformable patch embedding, and a panorama-aware loss, which takes input images with 3D disturbance into account, adds a spherical geometry-aware constraint on the existing deformable patch embedding, and indicates the pixel density of original

data, respectively. Experimental results on Stanford2D3D Panoramic datasets show that SGAT4PASS significantly improves performance and robustness, with approximately a 2% increase in mIoU, and when small 3D disturbances occur in the data, the stability of our performance is improved by an order of magnitude. Our code and supplementary material are available at https://github.com/TencentARC/SGAT4PASS.

Paper Structure (25 sections, 5 equations, 4 figures, 6 tables)

This paper contains 25 sections, 5 equations, 4 figures, 6 tables.

Introduction
Related Work
Panoramic Semantic Segmentation
Dynamic and Deformable Vision Transformers
Method
Background
Spherical Geometry-Aware (SGA) Framework
Spherical Geometry-Aware (SGA) Image Projection
SDPE: Spherical Deformable Patch Embedding
Intra-offset constraint.
Inter-offset constraint.
Panorama-Aware Loss
Experiments
Datasets and Protocols
Spherical Geometry-Aware (SGA) Validation.
...and 10 more sections

Figures (4)

Figure 1: The results with 3D disturbance input. (a) is the original image, and (b) / (c) is the images rotated $5^{\circ}$ in pitch / roll axis. Our baseline is Trans4PASS+. Compared with the minor change in images, the huge variance / performance change in SGA validation is shown in (d) / (e) and (f). "Mean" and "Variance" are defined in detail in \ref{['ssec:datasets']}.
Figure 2: Overall review of SGAT4PASS. We borrow the network from Trans4PASS+, and add three main modules: Spherical geometry-aware (SGA) image projection, SDPE, and panorama-aware loss. (Lower left) SGA image projection rotates the input panoramic images to mimic 3D disturbance. (Lower middle) SDPE adds several SGA constraints on deformable patch embedding and let it consider both image distortions and spherical geometry. (Lower right) Panorama-aware loss (PA loss) takes into account the pixel density of a sphere.
Figure 3: Visualization comparison of SGAT4PASS and Trans4PASS+. The rotation of the pitch / roll / yaw axis is $5^{\circ}$ / $5^{\circ}$ / $180^{\circ}$. SGAT4PASS gains the better results of semantic class "door" and "sofa" (highlighted by red dotted line boxes).
Figure 4: Influence of $\lambda_s$ and $\lambda_w$ in SGAT4PASS. The results are carried out on Stanford2D3D Panoramic datasets official fold 1.

SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation

TL;DR

Abstract

SGAT4PASS: Spherical Geometry-Aware Transformer for PAnoramic Semantic Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)