Table of Contents
Fetching ...

Aberration Correcting Vision Transformers for High-Fidelity Metalens Imaging

Byeonghyeon Lee, Youbin Kim, Yongjae Jo, Hyunsu Kim, Hyemi Park, Yangkyu Kim, Debabrata Mandal, Praneeth Chakravarthula, Inki Kim, Eunbyung Park

TL;DR

This work tackles the challenge of spatially varying aberrations in metalens imaging by introducing a Vision Transformer-based restoration framework. It couples a Multiple Adaptive Filters Guidance (MAFG) module, which generates diverse Wiener-filtered representations, with a Spatial and Transposed self-Attention Fusion (STAF) module to jointly exploit spatial and channel-wise attention for encoder–decoder restoration. The approach achieves state-of-the-art restoration across image, video, and 3D reconstruction tasks, and its practicality is corroborated by fabricating a metalens and restoring images captured with the device. The proposed method significantly advances high-fidelity metalens imaging and offers practical pathways for robust, real-world applications.

Abstract

Metalens is an emerging optical system with an irreplaceable merit in that it can be manufactured in ultra-thin and compact sizes, which shows great promise in various applications. Despite its advantage in miniaturization, its practicality is constrained by spatially varying aberrations and distortions, which significantly degrade the image quality. Several previous arts have attempted to address different types of aberrations, yet most of them are mainly designed for the traditional bulky lens and ineffective to remedy harsh aberrations of the metalens. While there have existed aberration correction methods specifically for metalens, they still fall short of restoration quality. In this work, we propose a novel aberration correction framework for metalens-captured images, harnessing Vision Transformers (ViT) that have the potential to restore metalens images with non-uniform aberrations. Specifically, we devise a Multiple Adaptive Filters Guidance (MAFG), where multiple Wiener filters enrich the degraded input images with various noise-detail balances and a cross-attention module reweights the features considering the different degrees of aberrations. In addition, we introduce a Spatial and Transposed self-Attention Fusion (STAF) module, which aggregates features from spatial self-attention and transposed self-attention modules to further ameliorate aberration correction. We conduct extensive experiments, including correcting aberrated images and videos, and clean 3D reconstruction. The proposed method outperforms the previous arts by a significant margin. We further fabricate a metalens and verify the practicality of our method by restoring the images captured with the manufactured metalens. Code and pre-trained models are available at https://benhenryl.github.io/Metalens-Transformer.

Aberration Correcting Vision Transformers for High-Fidelity Metalens Imaging

TL;DR

This work tackles the challenge of spatially varying aberrations in metalens imaging by introducing a Vision Transformer-based restoration framework. It couples a Multiple Adaptive Filters Guidance (MAFG) module, which generates diverse Wiener-filtered representations, with a Spatial and Transposed self-Attention Fusion (STAF) module to jointly exploit spatial and channel-wise attention for encoder–decoder restoration. The approach achieves state-of-the-art restoration across image, video, and 3D reconstruction tasks, and its practicality is corroborated by fabricating a metalens and restoring images captured with the device. The proposed method significantly advances high-fidelity metalens imaging and offers practical pathways for robust, real-world applications.

Abstract

Metalens is an emerging optical system with an irreplaceable merit in that it can be manufactured in ultra-thin and compact sizes, which shows great promise in various applications. Despite its advantage in miniaturization, its practicality is constrained by spatially varying aberrations and distortions, which significantly degrade the image quality. Several previous arts have attempted to address different types of aberrations, yet most of them are mainly designed for the traditional bulky lens and ineffective to remedy harsh aberrations of the metalens. While there have existed aberration correction methods specifically for metalens, they still fall short of restoration quality. In this work, we propose a novel aberration correction framework for metalens-captured images, harnessing Vision Transformers (ViT) that have the potential to restore metalens images with non-uniform aberrations. Specifically, we devise a Multiple Adaptive Filters Guidance (MAFG), where multiple Wiener filters enrich the degraded input images with various noise-detail balances and a cross-attention module reweights the features considering the different degrees of aberrations. In addition, we introduce a Spatial and Transposed self-Attention Fusion (STAF) module, which aggregates features from spatial self-attention and transposed self-attention modules to further ameliorate aberration correction. We conduct extensive experiments, including correcting aberrated images and videos, and clean 3D reconstruction. The proposed method outperforms the previous arts by a significant margin. We further fabricate a metalens and verify the practicality of our method by restoring the images captured with the manufactured metalens. Code and pre-trained models are available at https://benhenryl.github.io/Metalens-Transformer.

Paper Structure

This paper contains 28 sections, 7 equations, 22 figures, 9 tables.

Figures (22)

  • Figure 1: Left: The fabricated metalens. Right: An image captured with the manufactured metalens (top) and restored image with the proposed method (bottom).
  • Figure 2: The overview of our method. It comprises Multiple Adaptive Filters Guidance (MAFG) which produces different representations with various noise-detail balances, and a Spatial and Transposed self-Attention Fusion (STAF) module that aggregates features differently in encoder and decoder.
  • Figure 3: A comparison of the Wiener deconvolved images with different noise-penalization terms $K$. High $K$ produces smooth representation (b), while smaller $K$ results in a more textured but noisy representation (c).
  • Figure 4: The process of Multiple Adaptive Filters Guidance (MAFG). It produces $M$ different representations from the input images using $M$ different Wiener filters and applies cross-attention to reweight them.
  • Figure 5: Comparison on applying SA and TA. (a). Previous works that apply SA and TA alternatively. (b). Proposed Spatial and Transposed self-Attention Fusion (STAF). It implements SA and TA in parallel and fuses features with different weights in the encoder and decoder.
  • ...and 17 more figures