Table of Contents
Fetching ...

SphereDrag: Spherical Geometry-Aware Panoramic Image Editing

Zhiao Feng, Xuewei Li, Junjie Yang, Jingchao Li, Yuxin Peng, Xi Li

TL;DR

This work tackles panoramic image editing by integrating spherical geometry into a diffusion-based editing framework. The authors introduce SphereDrag, featuring adaptive reprojection (AR) to mitigate boundary discontinuities, great-circle trajectory adjustment (GCTA) to align edits with spherical paths, and spherical search region tracking (SSRT) to compensate for latitude-based pixel density distortions. They also present PanoBench, a dedicated panoramic editing benchmark, enabling standardized evaluation across scenes and styles. Empirical results show SphereDrag achieves state-of-the-art geometric consistency and image quality, with notable improvements in the Image Fidelity metric and reductions in FID/sFID, demonstrating practical potential for accurate, controllable editing of 360-degree content.

Abstract

Image editing has made great progress on planar images, but panoramic image editing remains underexplored. Due to their spherical geometry and projection distortions, panoramic images present three key challenges: boundary discontinuity, trajectory deformation, and uneven pixel density. To tackle these issues, we propose SphereDrag, a novel panoramic editing framework utilizing spherical geometry knowledge for accurate and controllable editing. Specifically, adaptive reprojection (AR) uses adaptive spherical rotation to deal with discontinuity; great-circle trajectory adjustment (GCTA) tracks the movement trajectory more accurate; spherical search region tracking (SSRT) adaptively scales the search range based on spherical location to address uneven pixel density. Also, we construct PanoBench, a panoramic editing benchmark, including complex editing tasks involving multiple objects and diverse styles, which provides a standardized evaluation framework. Experiments show that SphereDrag gains a considerable improvement compared with existing methods in geometric consistency and image quality, achieving up to 10.5% relative improvement.

SphereDrag: Spherical Geometry-Aware Panoramic Image Editing

TL;DR

This work tackles panoramic image editing by integrating spherical geometry into a diffusion-based editing framework. The authors introduce SphereDrag, featuring adaptive reprojection (AR) to mitigate boundary discontinuities, great-circle trajectory adjustment (GCTA) to align edits with spherical paths, and spherical search region tracking (SSRT) to compensate for latitude-based pixel density distortions. They also present PanoBench, a dedicated panoramic editing benchmark, enabling standardized evaluation across scenes and styles. Empirical results show SphereDrag achieves state-of-the-art geometric consistency and image quality, with notable improvements in the Image Fidelity metric and reductions in FID/sFID, demonstrating practical potential for accurate, controllable editing of 360-degree content.

Abstract

Image editing has made great progress on planar images, but panoramic image editing remains underexplored. Due to their spherical geometry and projection distortions, panoramic images present three key challenges: boundary discontinuity, trajectory deformation, and uneven pixel density. To tackle these issues, we propose SphereDrag, a novel panoramic editing framework utilizing spherical geometry knowledge for accurate and controllable editing. Specifically, adaptive reprojection (AR) uses adaptive spherical rotation to deal with discontinuity; great-circle trajectory adjustment (GCTA) tracks the movement trajectory more accurate; spherical search region tracking (SSRT) adaptively scales the search range based on spherical location to address uneven pixel density. Also, we construct PanoBench, a panoramic editing benchmark, including complex editing tasks involving multiple objects and diverse styles, which provides a standardized evaluation framework. Experiments show that SphereDrag gains a considerable improvement compared with existing methods in geometric consistency and image quality, achieving up to 10.5% relative improvement.

Paper Structure

This paper contains 22 sections, 23 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Illustration of challenges in panoramic image editing. (Upper right) The panoramic image may divide movement trajectory into two parts, located near the left and right boundaries, respectively. (Middle right) Straight lines in the panoramic image do not correspond to great-circle paths on the sphere, leading to trajectory deviations. (Lower right) The same region in the panoramic image corresponds to unequal solid angles at different latitudes, causing non-uniform tracking across the sphere.
  • Figure 2: Classic point-interactive image editing pipeline
  • Figure 3: Overview of SphereDrag. Using DragDiffusion as our baseline, we introduce our three parts: adaptive reprojection (AR), great-circle trajectory adjustment (GCTA), and spherical search region tracking (SSRT). AR: It applies spherical rotation to transform input panoramic images into a suitable representation. GCTA: It handles the points $P_{\text{tar}}$, $P_k$, and $P_{\text{han}}$ using the great-circle distance $d_{\text{gc}}$ in a spherical manner. The underlying planar feature maps visualize the corresponding trajectory. SSRT: It highlights the current (blue) and future (red) search regions, which have equal sizes on the sphere. However, their projected areas differ on the feature maps due to spherical distortion.
  • Figure 4: 90$^{\circ}$ FOV drag visualization: In the first row, our method effectively handles the seams at the sandy beach with strong winds, while other methods exhibit visible boundary artifacts. In the second row, our method successfully follows the intended dragging path, whereas other methods either fail to complete the drag or produce distorted results. In the third row, our approach accurately interprets the dragging intent. When dragging the lamp, the surrounding elements remain properly arranged, while other methods either fail to move the lamp correctly or result in chaotic outputs.
  • Figure S-1: In a panoramic image, $\cos \phi$ acts as the stretching factor, ensuring that the degree of stretching in the latitude direction from the sphere to the plane is consistent with the spherical geometry. The variation of $\cos \phi$ with latitude $\phi$ precisely reflects the projection requirements of the parallel lengths on the sphere, thus maintaining the accuracy and consistency of geographic information during the projection process.
  • ...and 4 more figures