Table of Contents
Fetching ...

Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures

Jiaqi He, Zhihua Wang, Leon Wang, Tsein-I Liu, Yuming Fang, Qilin Sun, Kede Ma

TL;DR

The paper tackles the problem of perceptual color difference assessment under image misalignment, where traditional co-located CD measures fail to predict human judgments. It introduces MS-SWD, a training-free CD metric that compares non-local patch distributions across multiple scales by building Gaussian pyramids in the $CIELAB$ space and computing the sliced Wasserstein distance ($SWD$) at each scale before averaging over $K$ scales. MS-SWD is demonstrated on the SPCD dataset to outperform competing methods on non-perfectly aligned image pairs and to exhibit favorable metric properties, with additional validation as a loss function for image and video color transfer. The work emphasizes computational efficiency through random linear projections and a sorting-based correspondence mechanism, and discusses avenues for extending the approach with alternative pyramids and perceptual tasks. The code is publicly available, enabling adoption in research and applications requiring robust perceptual CD assessment.

Abstract

Contemporary color difference (CD) measures for photographic images typically operate by comparing co-located pixels, patches in a ``perceptually uniform'' color space, or features in a learned latent space. Consequently, these measures inadequately capture the human color perception of misaligned image pairs, which are prevalent in digital photography (e.g., the same scene captured by different smartphones). In this paper, we describe a perceptual CD measure based on the multiscale sliced Wasserstein distance, which facilitates efficient comparisons between non-local patches of similar color and structure. This aligns with the modern understanding of color perception, where color and structure are inextricably interdependent as a unitary process of perceptual organization. Meanwhile, our method is easy to implement and training-free. Experimental results indicate that our CD measure performs favorably in assessing CDs in photographic images, and consistently surpasses competing models in the presence of image misalignment. Additionally, we empirically verify that our measure functions as a metric in the mathematical sense, and show its promise as a loss function for image and video color transfer tasks. The code is available at https://github.com/real-hjq/MS-SWD.

Multiscale Sliced Wasserstein Distances as Perceptual Color Difference Measures

TL;DR

The paper tackles the problem of perceptual color difference assessment under image misalignment, where traditional co-located CD measures fail to predict human judgments. It introduces MS-SWD, a training-free CD metric that compares non-local patch distributions across multiple scales by building Gaussian pyramids in the space and computing the sliced Wasserstein distance () at each scale before averaging over scales. MS-SWD is demonstrated on the SPCD dataset to outperform competing methods on non-perfectly aligned image pairs and to exhibit favorable metric properties, with additional validation as a loss function for image and video color transfer. The work emphasizes computational efficiency through random linear projections and a sorting-based correspondence mechanism, and discusses avenues for extending the approach with alternative pyramids and perceptual tasks. The code is publicly available, enabling adoption in research and applications requiring robust perceptual CD assessment.

Abstract

Contemporary color difference (CD) measures for photographic images typically operate by comparing co-located pixels, patches in a ``perceptually uniform'' color space, or features in a learned latent space. Consequently, these measures inadequately capture the human color perception of misaligned image pairs, which are prevalent in digital photography (e.g., the same scene captured by different smartphones). In this paper, we describe a perceptual CD measure based on the multiscale sliced Wasserstein distance, which facilitates efficient comparisons between non-local patches of similar color and structure. This aligns with the modern understanding of color perception, where color and structure are inextricably interdependent as a unitary process of perceptual organization. Meanwhile, our method is easy to implement and training-free. Experimental results indicate that our CD measure performs favorably in assessing CDs in photographic images, and consistently surpasses competing models in the presence of image misalignment. Additionally, we empirically verify that our measure functions as a metric in the mathematical sense, and show its promise as a loss function for image and video color transfer tasks. The code is available at https://github.com/real-hjq/MS-SWD.
Paper Structure (12 sections, 6 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 12 sections, 6 equations, 7 figures, 4 tables, 1 algorithm.

Figures (7)

  • Figure 1: Which image is closer to the reference in terms of color appearance? Contemporary CD measures that seek co-located comparisons often fail to explain human judgments. The proposed MS-SWD measure based on the multiscale sliced Wasserstein distance aligns with human color perception in these four challenging cases of image misalignment: global motion due to camera movement (first row), local motion due to object displacement (second row), horizontal flipping (third row), and similar natural scenes from different viewpoints (last row).
  • Figure 2: System diagram of the proposed MS-SWD for perceptual CD assessment.
  • Figure 3: The sort() operator in MS-SWD enables efficient comparisons of non-local patches with similar color appearance and structural information. Each curve represents a different random projection; the patches at the two ends of the curve share the same rank (i.e., correspondence) after sorting, thus subject to CD calculation.
  • Figure 4: Illustration of multiscale analysis in ensuring pixel-level image fidelity. Images (c)-(h) are generated by minimizing $\Delta E (\bm X, \bm Y)$ with respect to $\bm Y$ to match (a) the reference image $\bm X$, starting from (b) the initial Gaussian noise image $\bm{Y}_\mathrm{init}$ and for different values of $K$.
  • Figure 5: Comparison of CD Maps for a non-perfectly aligned pair, where a warmer color indicates a larger pixel-wise (or patch-wise) CD.
  • ...and 2 more figures