Prompt-Guided Image-Adaptive Neural Implicit Lookup Tables for Interpretable Image Enhancement
Satoshi Kosugi
TL;DR
This work tackles the problem of interpretable yet high-quality image enhancement by moving beyond handcrafted or linear LUT-based edits. It introduces IA-NILUT, an Image‑Adaptive Neural Implicit Lookup Table where an MLP encodes a color transform and ingests image-adaptive parameters ${\bf w} \in \mathbb{R}^J$, with LUT bypassing enabling real-time application; a prompt guidance loss using CLIP pairs guides each filter to an intuitive human label while enforcing that each $w_j$ controls only its designated impression. The method is trained in three stages and evaluated on FiveK and PPR10K, showing superior performance to predefined filter baselines and competitive results against uninterpretable methods, while achieving strong interpretability thanks to sorted RGB inputs and the prompting framework. Overall, IA-NILUT provides a scalable, interpretable, and efficient mechanism for content-aware image enhancement with practical implications for on-device editing and user-facing photo tools.
Abstract
In this paper, we delve into the concept of interpretable image enhancement, a technique that enhances image quality by adjusting filter parameters with easily understandable names such as "Exposure" and "Contrast". Unlike using predefined image editing filters, our framework utilizes learnable filters that acquire interpretable names through training. Our contribution is two-fold. Firstly, we introduce a novel filter architecture called an image-adaptive neural implicit lookup table, which uses a multilayer perceptron to implicitly define the transformation from input feature space to output color space. By incorporating image-adaptive parameters directly into the input features, we achieve highly expressive filters. Secondly, we introduce a prompt guidance loss to assign interpretable names to each filter. We evaluate visual impressions of enhancement results, such as exposure and contrast, using a vision and language model along with guiding prompts. We define a constraint to ensure that each filter affects only the targeted visual impression without influencing other attributes, which allows us to obtain the desired filter effects. Experimental results show that our method outperforms existing predefined filter-based methods, thanks to the filters optimized to predict target results. Our source code is available at https://github.com/satoshi-kosugi/PG-IA-NILUT.
