Table of Contents
Fetching ...

Geometry Fidelity for Spherical Images

Anders Christensen, Nooshin Mojab, Khushman Patel, Karan Ahuja, Zeynep Akata, Ole Winther, Mar Gonzalez-Franco, Andrea Colaco

TL;DR

The paper identifies a fundamental gap in evaluating spherical image generation using standard Fréchet Inception Distance (FID), which overlooks geometry-specific distortions. It introduces OmniFID, a cubemap-based extension of FID that aggregates per-view distributions across three view groups $\{U,D,\mathcal{F}\}$ to capture field-of-view fidelity via $OmniFID(X_1,X_2) = \frac{1}{3} \sum_{V\in\{U,D,\mathcal{F}\}} \overline{FID}(\mathcal{C}^{X_1}_V, \mathcal{C}^{X_2}_V)$, while preserving FID’s sensitivity to noise. It also defines Discontinuity Score (DS), a kernel-based measure of seam alignment across borders in equirectangular representations, with $DS(I) = \frac{L}{H_E} \sum_i DS(a_i)$ and $DS(a) = \frac{1}{2L} \sum_{y=0}^{L-1} \left( \frac{|\hat{a}(2,y)|}{|\hat{a}(1,y)|+c} + \frac{|\hat{a}(3,y)|}{|\hat{a}(4,y)|+c} \right)$. Through experiments on datasets like 360-Indoor, the authors show OmniFID detects vertical FOV reductions that FID misses, while DS correlates with seam severity across resolutions. These metrics collectively advance geometry-aware evaluation for spherical image generation, enabling more reliable benchmarking and guiding future metric development and dataset design for panoramic imagery.

Abstract

Spherical or omni-directional images offer an immersive visual format appealing to a wide range of computer vision applications. However, geometric properties of spherical images pose a major challenge for models and metrics designed for ordinary 2D images. Here, we show that direct application of Fréchet Inception Distance (FID) is insufficient for quantifying geometric fidelity in spherical images. We introduce two quantitative metrics accounting for geometric constraints, namely Omnidirectional FID (OmniFID) and Discontinuity Score (DS). OmniFID is an extension of FID tailored to additionally capture field-of-view requirements of the spherical format by leveraging cubemap projections. DS is a kernel-based seam alignment score of continuity across borders of 2D representations of spherical images. In experiments, OmniFID and DS quantify geometry fidelity issues that are undetected by FID.

Geometry Fidelity for Spherical Images

TL;DR

The paper identifies a fundamental gap in evaluating spherical image generation using standard Fréchet Inception Distance (FID), which overlooks geometry-specific distortions. It introduces OmniFID, a cubemap-based extension of FID that aggregates per-view distributions across three view groups to capture field-of-view fidelity via , while preserving FID’s sensitivity to noise. It also defines Discontinuity Score (DS), a kernel-based measure of seam alignment across borders in equirectangular representations, with and . Through experiments on datasets like 360-Indoor, the authors show OmniFID detects vertical FOV reductions that FID misses, while DS correlates with seam severity across resolutions. These metrics collectively advance geometry-aware evaluation for spherical image generation, enabling more reliable benchmarking and guiding future metric development and dataset design for panoramic imagery.

Abstract

Spherical or omni-directional images offer an immersive visual format appealing to a wide range of computer vision applications. However, geometric properties of spherical images pose a major challenge for models and metrics designed for ordinary 2D images. Here, we show that direct application of Fréchet Inception Distance (FID) is insufficient for quantifying geometric fidelity in spherical images. We introduce two quantitative metrics accounting for geometric constraints, namely Omnidirectional FID (OmniFID) and Discontinuity Score (DS). OmniFID is an extension of FID tailored to additionally capture field-of-view requirements of the spherical format by leveraging cubemap projections. DS is a kernel-based seam alignment score of continuity across borders of 2D representations of spherical images. In experiments, OmniFID and DS quantify geometry fidelity issues that are undetected by FID.
Paper Structure (10 sections, 5 equations, 12 figures)

This paper contains 10 sections, 5 equations, 12 figures.

Figures (12)

  • Figure 1: Visually, it is difficult to recognize field-of-view issues in the equirectangular format, but the problem is evident when rendered as a sphere and looking up/down. Top left: original spherical image with 180° vertical FOV represented as an equirectangular image. Top right: Resulting noisy equirectangular image, with a reduced vertical FOV of 140°. Bottom row: comparison of resulting views when looking upwards and downwards, respectively, in the two spherical images.
  • Figure 1: Four example equirectangular generations of a text-to-image model fine-tuned on 360-Indoor after 5000 steps. Under each generation we show the cubemap images to illustrate the geometry of the rendered views (top left to bottom right: front/right/back/left/up/down). FID is 33.96, OmniFID is 63.39.
  • Figure 2: Although recent spherical image generation models (Text-2-Sphere and Image-2-Sphere) have begun achieving low FID scores, models are still struggling to produce images with full 180° vertical field-of-view and no seams. Above, we show equirectangular images from the models Text2Light chen2022text2light and AOG-Net Lu2023AutoregressiveOO (top row in each block), along with their reported FID score. These images are from their respective papers. Below each image we display a perspective view when looking backwards, showing the resulting stitching across image borders (and at the poles). We find that FID does not sufficiently capture geometry fidelity issues in the generated images, such as benches converging to a point at the poles, or inconsistencies across image borders.
  • Figure 2: Four example equirectangular generations of a text-to-image model fine-tuned on 360-Indoor after 10000 steps. Under each generation we show the cubemap images to illustrate the geometry of the rendered views (top left to bottom right: front/right/back/left/up/down). FID is 35.42, OmniFID is 60.38.
  • Figure 3: FID results compared to our modification, OmniFID, for detecting issues with field-of-view reductions on the 360-Indoor spherical image dataset Chou2019360IndoorTL. FID increases negligibly, despite reducing the vertical field-of-view from 180° to 140°, while our proposed OmniFID captures the difference.
  • ...and 7 more figures