Table of Contents
Fetching ...

SAP: Segment Any 4K Panorama

Lutao Jiang, Zidong Cao, Weikai Chen, Xu Zheng, Yuanhuiyi Lyu, Zhenyang Li, Zeyu HU, Yingda Yin, Keyang Luo, Runze Zhang, Kai Yan, Shengju Qian, Haidi Fan, Yifan Peng, Xin Wang, Hui Xiong, Ying-Cong Chen

Abstract

Promptable instance segmentation is widely adopted in embodied and AR systems, yet the performance of foundation models trained on perspective imagery often degrades on 360° panoramas. In this paper, we introduce Segment Any 4K Panorama (SAP), a foundation model for 4K high-resolution panoramic instance-level segmentation. We reformulate panoramic segmentation as fixed-trajectory perspective video segmentation, decomposing a panorama into overlapping perspective patches sampled along a continuous spherical traversal. This memory-aligned reformulation preserves native 4K resolution while restoring the smooth viewpoint transitions required for stable cross-view propagation. To enable large-scale supervision, we synthesize 183,440 4K-resolution panoramic images with instance segmentation labels using the InfiniGen engine. Trained under this trajectory-aligned paradigm, SAP generalizes effectively to real-world 360° images, achieving +17.2 zero-shot mIoU gain over vanilla SAM2 of different sizes on real-world 4K panorama benchmark.

SAP: Segment Any 4K Panorama

Abstract

Promptable instance segmentation is widely adopted in embodied and AR systems, yet the performance of foundation models trained on perspective imagery often degrades on 360° panoramas. In this paper, we introduce Segment Any 4K Panorama (SAP), a foundation model for 4K high-resolution panoramic instance-level segmentation. We reformulate panoramic segmentation as fixed-trajectory perspective video segmentation, decomposing a panorama into overlapping perspective patches sampled along a continuous spherical traversal. This memory-aligned reformulation preserves native 4K resolution while restoring the smooth viewpoint transitions required for stable cross-view propagation. To enable large-scale supervision, we synthesize 183,440 4K-resolution panoramic images with instance segmentation labels using the InfiniGen engine. Trained under this trajectory-aligned paradigm, SAP generalizes effectively to real-world 360° images, achieving +17.2 zero-shot mIoU gain over vanilla SAM2 of different sizes on real-world 4K panorama benchmark.
Paper Structure (24 sections, 8 equations, 7 figures, 8 tables, 1 algorithm)

This paper contains 24 sections, 8 equations, 7 figures, 8 tables, 1 algorithm.

Figures (7)

  • Figure 1: The main challenges of equirectangular panoramic segmentation include (i) severe distortions compared to standard perspective images, (ii) discontinuities at the left-right seam and near the poles, and (iii) ultra-high resolution that is rarely addressed by conventional segmentation methods. Our key idea is to convert a panorama into a fixed-trajectory perspective scanning video to avoid these issues and to provide high-quality synthetic data for fine-tuning segmentation models.
  • Figure 2: Overview of our pipeline. We reformulate panoramic segmentation as perspective scanning video segmentation. A panorama with the prompt points is first projected into multiple continuous perspective frames with a fixed trajectory. Next, the scanning video is processed with the fine-tuned SAM2 to generate video segmentation results, which are merged back to the ERP plane to obtain the final segmentation.
  • Figure 3: The visualization of different trajectories.
  • Figure 4: Qualitative Comparison based on PAV-SOD zhang2023pav dataset.
  • Figure 5: Qualitative Comparison based on HunyuanWorld-1.0 team2025hunyuanworld dataset. The red circles indicate segmentation errors.
  • ...and 2 more figures