Table of Contents
Fetching ...

Lightweight Optimal-Transport Harmonization on Edge Devices

Maria Larchenko, Dmitry Guskov, Alexander Lobashev, Georgy Derevyanko

TL;DR

The paper tackles real-time color harmonization for augmented reality by casting the problem as an optimal transport task and predicting MKL transport map parameters with a lightweight encoder suitable for edge devices. It grounds the approach in classical OT theory, derives an explicit error bound for the MKL approximation, and demonstrates practical viability with an EfficientNet-B0-based encoder that outputs a 12-parameter MKL filter. The authors also contribute an AR-specific dataset (ARCore) and show competitive quantitative metrics as well as strong perceptual performance, including on-device inference at mobile frame rates. This work enables fast, on-device color harmonization suitable for immersive AR pipelines and provides a theoretical and empirical basis for relying on linear OT-based filters in this domain.

Abstract

Color harmonization adjusts the colors of an inserted object so that it perceptually matches the surrounding image, resulting in a seamless composite. The harmonization problem naturally arises in augmented reality (AR), yet harmonization algorithms are not currently integrated into AR pipelines because real-time solutions are scarce. In this work, we address color harmonization for AR by proposing a lightweight approach that supports on-device inference. For this, we leverage classical optimal transport theory by training a compact encoder to predict the Monge-Kantorovich transport map. We benchmark our MKL-Harmonizer algorithm against state-of-the-art methods and demonstrate that for real composite AR images our method achieves the best aggregated score. We release our dedicated AR dataset of composite images with pixel-accurate masks and data-gathering toolkit to support further data acquisition by researchers.

Lightweight Optimal-Transport Harmonization on Edge Devices

TL;DR

The paper tackles real-time color harmonization for augmented reality by casting the problem as an optimal transport task and predicting MKL transport map parameters with a lightweight encoder suitable for edge devices. It grounds the approach in classical OT theory, derives an explicit error bound for the MKL approximation, and demonstrates practical viability with an EfficientNet-B0-based encoder that outputs a 12-parameter MKL filter. The authors also contribute an AR-specific dataset (ARCore) and show competitive quantitative metrics as well as strong perceptual performance, including on-device inference at mobile frame rates. This work enables fast, on-device color harmonization suitable for immersive AR pipelines and provides a theoretical and empirical basis for relying on linear OT-based filters in this domain.

Abstract

Color harmonization adjusts the colors of an inserted object so that it perceptually matches the surrounding image, resulting in a seamless composite. The harmonization problem naturally arises in augmented reality (AR), yet harmonization algorithms are not currently integrated into AR pipelines because real-time solutions are scarce. In this work, we address color harmonization for AR by proposing a lightweight approach that supports on-device inference. For this, we leverage classical optimal transport theory by training a compact encoder to predict the Monge-Kantorovich transport map. We benchmark our MKL-Harmonizer algorithm against state-of-the-art methods and demonstrate that for real composite AR images our method achieves the best aggregated score. We release our dedicated AR dataset of composite images with pixel-accurate masks and data-gathering toolkit to support further data acquisition by researchers.

Paper Structure

This paper contains 10 sections, 6 theorems, 32 equations, 13 figures, 2 tables.

Key Result

Lemma 1

Let the clipping operator$\operatorname{clip}:\mathbb{R}^{d}\!\to\![0,1]^{d}$ be defined component-wise by Then for every $z\in\mathbb{R}^{d}$ the vector $y=\operatorname{clip}(z)$ is the unique Euclidean projection of $z$ onto the cube $\mathcal{X}=[0,1]^d$; that is, $\operatorname{clip}(z)=\Pi_{\mathcal{X}}(z)$.

Figures (13)

  • Figure 1: Harmonization running on edge device.
  • Figure 2: Exposure bias in image harmonization. Augmented image from iHarmony4 training partition. (A) Small regions near the boundary of the mask contains an information about unharmonized background and can inadvertently teach the harmonization network to over-rely on this specific information. (B) For object inserted from a 3D engine the mask is "pixel-perfect".
  • Figure 3: Mean opinion score versus inference speed calculated for images with 1080x2204 resolution from ARCore data. Data was processed on NVIDIA RTX 4060Ti GPU.
  • Figure 4: The 3D objects used in our experiments feature different sizes and textures.
  • Figure 5: The general scheme of main rendering loop demonstrates four openGL Framebuffers. Default one performs on-screen rendering, others are used for off-screen rendering and auxiliary tasks, such as data collection. Limitations of LiteRT Next Kotlin introduces computational overhead due to copying model input through the CPU. Masks are rendered concurrently with foreground objects, obtained directly from the rendering engine and are saved on user's demand.
  • ...and 8 more figures

Theorems & Definitions (10)

  • Lemma 1: Clipping equals Euclidean projection
  • Theorem 1: Error Bound for L-Lipschitz Color Maps
  • proof : Proof Sketch
  • Lemma 2: Tail–probability bound for the clipping error
  • Lemma 3: Clipping equals Euclidean projection
  • proof
  • Theorem 2: Error Bound for L-Lipschitz Color Maps
  • proof
  • Lemma 4: Tail–probability bound for the clipping error
  • proof