Table of Contents
Fetching ...

Efficient Neural Network Encoding for 3D Color Lookup Tables

Vahid Zehtab, David B. Lindell, Marcus A. Brubaker, Michael S. Brown

TL;DR

The paper tackles the challenge of storing and deploying hundreds of 3D LUTs by introducing a compact neural representation that encodes up to 512 LUTs in under 0.25 MB, enabling on-the-fly reconstruction with perceptual distortion below $\bar{\Delta E}_M \leq 2.0$ across the color gamut. A residual-flow-inspired architecture with Lipschitz-bounded residual blocks conditions on LUT indices to reconstruct LUT outputs, and optional weighting and a bijective variant enable invertible color processing. Empirical results show dramatic memory savings (>$99\%$ compression) with minimal perceptual loss, fast runtime (MS-scale per LUT), and improved performance when trained on natural color distributions or using a $\Delta E$ loss. The work also demonstrates a practical invertible LUT capability and provides a public implementation, signaling substantial impact for on-device color processing and real-time workflows.

Abstract

3D color lookup tables (LUTs) enable precise color manipulation by mapping input RGB values to specific output RGB values. 3D LUTs are instrumental in various applications, including video editing, in-camera processing, photographic filters, computer graphics, and color processing for displays. While an individual LUT does not incur a high memory overhead, software and devices may need to store dozens to hundreds of LUTs that can take over 100 MB. This work aims to develop a neural network architecture that can encode hundreds of LUTs in a single compact representation. To this end, we propose a model with a memory footprint of less than 0.25 MB that can reconstruct 512 LUTs with only minor color distortion ($\barΔE_M$ $\leq$ 2.0) over the entire color gamut. We also show that our network can weight colors to provide further quality gains on natural image colors ($\barΔ{E}_M$ $\leq$ 1.0). Finally, we show that minor modifications to the network architecture enable a bijective encoding that produces LUTs that are invertible, allowing for reverse color processing. Our code is available at https://github.com/vahidzee/ennelut.

Efficient Neural Network Encoding for 3D Color Lookup Tables

TL;DR

The paper tackles the challenge of storing and deploying hundreds of 3D LUTs by introducing a compact neural representation that encodes up to 512 LUTs in under 0.25 MB, enabling on-the-fly reconstruction with perceptual distortion below across the color gamut. A residual-flow-inspired architecture with Lipschitz-bounded residual blocks conditions on LUT indices to reconstruct LUT outputs, and optional weighting and a bijective variant enable invertible color processing. Empirical results show dramatic memory savings (> compression) with minimal perceptual loss, fast runtime (MS-scale per LUT), and improved performance when trained on natural color distributions or using a loss. The work also demonstrates a practical invertible LUT capability and provides a public implementation, signaling substantial impact for on-device color processing and real-time workflows.

Abstract

3D color lookup tables (LUTs) enable precise color manipulation by mapping input RGB values to specific output RGB values. 3D LUTs are instrumental in various applications, including video editing, in-camera processing, photographic filters, computer graphics, and color processing for displays. While an individual LUT does not incur a high memory overhead, software and devices may need to store dozens to hundreds of LUTs that can take over 100 MB. This work aims to develop a neural network architecture that can encode hundreds of LUTs in a single compact representation. To this end, we propose a model with a memory footprint of less than 0.25 MB that can reconstruct 512 LUTs with only minor color distortion ( 2.0) over the entire color gamut. We also show that our network can weight colors to provide further quality gains on natural image colors ( 1.0). Finally, we show that minor modifications to the network architecture enable a bijective encoding that produces LUTs that are invertible, allowing for reverse color processing. Our code is available at https://github.com/vahidzee/ennelut.

Paper Structure

This paper contains 26 sections, 5 equations, 13 figures, 6 tables, 1 algorithm.

Figures (13)

  • Figure 1: We propose a neural network architecture that encodes hundreds of LUTs into a single representation at a fraction of the memory requirements. Encoded LUTs can be reconstructed with minimal color distortion (i.e., $\Delta E\leq2$).
  • Figure 2: Our network consists of $D$ transformations $T_i(\cdot, \mathbf{o})$, conditioned on a specific LUT by a one-hot encoded index $\mathbf{o}$ indicating the desired LUT to use. $T_i$s contribute to the reconstructed output color through consecutive residual additions. $T_i$s are modeled with multilayer perceptrons (MLP) with $\alpha$ non-linearities, where the biases of their first layer are selected based on $\mathbf{o}$. Activation normalization kingma2018glow is used after each transformation to control the magnitude of the residual functions, ensuring stability in deeper architectures. $\tanh^{-1}$ and $\tanh$ respectively transform the inputs and outputs of the network, bringing it closer to the local identity.
  • Figure 3: This plot shows how varying the number of embedded LUTs affects the performance of our models when training uniformly on the color space using an L2 loss function in RGB and evaluating against $256^3$ Hald images. The specified number of parameters represents the parameters in the $T_i$ blocks without counting the $E_i$s. Results are averaged over ten runs per #LUTs, each using a different set of LUTs.
  • Figure 4: This figure shows a Hald image representing all input colors (left) and the distribution of these colors in the 100 Adobe-MIT5K images used for training (right).
  • Figure 5: This figure shows the performance of different model sizes when embedding a varying number of LUTs trained and tested on natural images with an $L_2$ training objective. The number of parameters in the model size represents the parameters in the $T_i$ blocks, not counting the $E_i$s.
  • ...and 8 more figures