Table of Contents
Fetching ...

Hundred-Kilobyte Lookup Tables for Efficient Single-Image Super-Resolution

Binxiao Huang, Jason Chun Lok Li, Jie Ran, Boyu Li, Jiajun Zhou, Dahai Yu, Ngai Wong

TL;DR

This work tackles edge-enabled single-image super-resolution by designing hundred-kilobyte lookup-table models (HKLUT) that fit on-chip and avoid interpolation. It introduces an asymmetric two-branch architecture with MSB/LSB specialization and rotation-ensemble kernels to drastically reduce LUT size, then extends to multistage progressive upsampling that enables inter-branch communication during stage transitions. The HKLUT family achieves sub-1 MB total storage (≈100–112.5 KB per model) while delivering competitive PSNR/SSIM on standard benchmarks, outperforming prior LUT-based schemes in storage, energy, and runtime. The results demonstrate strong potential for efficient SR on resource-constrained devices, with practical implications for edge AI deployments.

Abstract

Conventional super-resolution (SR) schemes make heavy use of convolutional neural networks (CNNs), which involve intensive multiply-accumulate (MAC) operations, and require specialized hardware such as graphics processing units. This contradicts the regime of edge AI that often runs on devices strained by power, computing, and storage resources. Such a challenge has motivated a series of lookup table (LUT)-based SR schemes that employ simple LUT readout and largely elude CNN computation. Nonetheless, the multi-megabyte LUTs in existing methods still prohibit on-chip storage and necessitate off-chip memory transport. This work tackles this storage hurdle and innovates hundred-kilobyte LUT (HKLUT) models amenable to on-chip cache. Utilizing an asymmetric two-branch multistage network coupled with a suite of specialized kernel patterns, HKLUT demonstrates an uncompromising performance and superior hardware efficiency over existing LUT schemes. Our implementation is publicly available at: https://github.com/jasonli0707/hklut.

Hundred-Kilobyte Lookup Tables for Efficient Single-Image Super-Resolution

TL;DR

This work tackles edge-enabled single-image super-resolution by designing hundred-kilobyte lookup-table models (HKLUT) that fit on-chip and avoid interpolation. It introduces an asymmetric two-branch architecture with MSB/LSB specialization and rotation-ensemble kernels to drastically reduce LUT size, then extends to multistage progressive upsampling that enables inter-branch communication during stage transitions. The HKLUT family achieves sub-1 MB total storage (≈100–112.5 KB per model) while delivering competitive PSNR/SSIM on standard benchmarks, outperforming prior LUT-based schemes in storage, energy, and runtime. The results demonstrate strong potential for efficient SR on resource-constrained devices, with practical implications for edge AI deployments.

Abstract

Conventional super-resolution (SR) schemes make heavy use of convolutional neural networks (CNNs), which involve intensive multiply-accumulate (MAC) operations, and require specialized hardware such as graphics processing units. This contradicts the regime of edge AI that often runs on devices strained by power, computing, and storage resources. Such a challenge has motivated a series of lookup table (LUT)-based SR schemes that employ simple LUT readout and largely elude CNN computation. Nonetheless, the multi-megabyte LUTs in existing methods still prohibit on-chip storage and necessitate off-chip memory transport. This work tackles this storage hurdle and innovates hundred-kilobyte LUT (HKLUT) models amenable to on-chip cache. Utilizing an asymmetric two-branch multistage network coupled with a suite of specialized kernel patterns, HKLUT demonstrates an uncompromising performance and superior hardware efficiency over existing LUT schemes. Our implementation is publicly available at: https://github.com/jasonli0707/hklut.
Paper Structure (22 sections, 1 equation, 12 figures, 8 tables)

This paper contains 22 sections, 1 equation, 12 figures, 8 tables.

Figures (12)

  • Figure 1: PSNR (dataset: Set5) vs. model size: HKLUTs are the first hundred-KB series of LUT-based SISR models targeting the sub-1MB regime (NB. $x$-axis is in log scale).
  • Figure 2: Three types of kernels covering a $3\times3$ RF with rotation ensemble via different numbers of input pixels. (a) SRLUT (b) HDLUT (c) LLUT.
  • Figure 3: With rotation ensemble, HDBLUT can cover a $5\times5$ area with 3 three-pixel kernels. On the right, each square represents a single pixel, with a color indicating which kernel covers it.
  • Figure 4: Four randomly selected effective receptive fields (ERFs). Top: Most significant 4 bits. Bottom: Least significant 4 bits. The brightness denotes model's sensitivity to that pixel, justifying the assymetric $5\times5$ and $3\times3$ kernels for respective branches.
  • Figure 5: Overall architecture of the proposed method (HKLUT-S).
  • ...and 7 more figures