Taming Lookup Tables for Efficient Image Retouching

Sidi Yang; Binxiao Huang; Mingdeng Cao; Yatai Ji; Hanzhong Guo; Ngai Wong; Yujiu Yang

Taming Lookup Tables for Efficient Image Retouching

Sidi Yang, Binxiao Huang, Mingdeng Cao, Yatai Ji, Hanzhong Guo, Ngai Wong, Yujiu Yang

TL;DR

This work tackles the demand for fast, power-efficient image retouching on edge devices by introducing ICELUT, a purely LUT-based enhancer that replaces CNN inference with table lookups. The method uses two parallel MSB/LSB processing branches fed by a fully pointwise CNN backbone and a novel split FC layer to generate weights that fuse a small set of basis 3D LUTs into a final LUT, enabling hardware-agnostic, low-latency inference after training. ICELUT demonstrates near-state-of-the-art image quality on public retouching datasets while achieving exceptional speed (about $0.4\mathrm{ms}$ on GPU and $7\mathrm{ms}$ on CPU) and memory efficiency (sub-1 MB storage), greatly reducing FLOPs compared to CNN-based approaches. The key contributions include the channel-aware LUT design, the split FC mechanism for memory control, and the robust performance under extreme input downsampling, enabling real-time color enhancement on resource-constrained devices.

Abstract

The widespread use of high-definition screens in edge devices, such as end-user cameras, smartphones, and televisions, is spurring a significant demand for image enhancement. Existing enhancement models often optimize for high performance while falling short of reducing hardware inference time and power consumption, especially on edge devices with constrained computing and storage resources. To this end, we propose Image Color Enhancement Lookup Table (ICELUT) that adopts LUTs for extremely efficient edge inference, without any convolutional neural network (CNN). During training, we leverage pointwise (1x1) convolution to extract color information, alongside a split fully connected layer to incorporate global information. Both components are then seamlessly converted into LUTs for hardware-agnostic deployment. ICELUT achieves near-state-of-the-art performance and remarkably low power consumption. We observe that the pointwise network structure exhibits robust scalability, upkeeping the performance even with a heavily downsampled 32x32 input image. These enable ICELUT, the first-ever purely LUT-based image enhancer, to reach an unprecedented speed of 0.4ms on GPU and 7ms on CPU, at least one order faster than any CNN solution. Codes are available at https://github.com/Stephen0808/ICELUT.

Taming Lookup Tables for Efficient Image Retouching

TL;DR

on GPU and

on CPU) and memory efficiency (sub-1 MB storage), greatly reducing FLOPs compared to CNN-based approaches. The key contributions include the channel-aware LUT design, the split FC mechanism for memory control, and the robust performance under extreme input downsampling, enabling real-time color enhancement on resource-constrained devices.

Abstract

Paper Structure (29 sections, 3 equations, 7 figures, 11 tables)

This paper contains 29 sections, 3 equations, 7 figures, 11 tables.

Introduction
Related works
Learning-based image enhancement
3D LUT-based image enhancement
Replacing CNN with LUT
Method
3D LUT preliminaries
Training network
CNN backbone
Split fully connected layer
Transferring to LUT
Speeding up inference
Experiments
Datasets
Implementation details
...and 14 more sections

Figures (7)

Figure 1: Three image enhancement pipelines. FLOPs and latency are measured on the CPU for the (orangish) backbones. CSRNethe2020conditional and CLUTzhang2022clut are chosen as representatives for the (a) end-to-end and (b) 3D LUT methods, and our approach, developed post-training, is (c) purely LUT-based.
Figure 1: Histogram of images at different resolutions.
Figure 2: Overall architecture of the proposed ICELUT. Our model first employs a two-branch structure to parallelly process the MSB and LSB maps and then uses a fully pointwise network with a restricted receptive field to extract features. Furthermore, a split FC layer is utilized to fuse the global information for predicting the weights to combine the 3D LUTs for table lookup and interpolation. Note that the feature extractor and split FC, once trained, are transferred to lookup tables for purely LUT inference.
Figure 2: Failure case.
Figure 3: Visualization of results at different inference scales. The bottom right shows the error map with the target image. Brighter areas indicate larger absolute errors.
...and 2 more figures

Taming Lookup Tables for Efficient Image Retouching

TL;DR

Abstract

Taming Lookup Tables for Efficient Image Retouching

Authors

TL;DR

Abstract

Table of Contents

Figures (7)