Hundred-Kilobyte Lookup Tables for Efficient Single-Image Super-Resolution
Binxiao Huang, Jason Chun Lok Li, Jie Ran, Boyu Li, Jiajun Zhou, Dahai Yu, Ngai Wong
TL;DR
This work tackles edge-enabled single-image super-resolution by designing hundred-kilobyte lookup-table models (HKLUT) that fit on-chip and avoid interpolation. It introduces an asymmetric two-branch architecture with MSB/LSB specialization and rotation-ensemble kernels to drastically reduce LUT size, then extends to multistage progressive upsampling that enables inter-branch communication during stage transitions. The HKLUT family achieves sub-1 MB total storage (≈100–112.5 KB per model) while delivering competitive PSNR/SSIM on standard benchmarks, outperforming prior LUT-based schemes in storage, energy, and runtime. The results demonstrate strong potential for efficient SR on resource-constrained devices, with practical implications for edge AI deployments.
Abstract
Conventional super-resolution (SR) schemes make heavy use of convolutional neural networks (CNNs), which involve intensive multiply-accumulate (MAC) operations, and require specialized hardware such as graphics processing units. This contradicts the regime of edge AI that often runs on devices strained by power, computing, and storage resources. Such a challenge has motivated a series of lookup table (LUT)-based SR schemes that employ simple LUT readout and largely elude CNN computation. Nonetheless, the multi-megabyte LUTs in existing methods still prohibit on-chip storage and necessitate off-chip memory transport. This work tackles this storage hurdle and innovates hundred-kilobyte LUT (HKLUT) models amenable to on-chip cache. Utilizing an asymmetric two-branch multistage network coupled with a suite of specialized kernel patterns, HKLUT demonstrates an uncompromising performance and superior hardware efficiency over existing LUT schemes. Our implementation is publicly available at: https://github.com/jasonli0707/hklut.
