Aberration Correcting Vision Transformers for High-Fidelity Metalens Imaging
Byeonghyeon Lee, Youbin Kim, Yongjae Jo, Hyunsu Kim, Hyemi Park, Yangkyu Kim, Debabrata Mandal, Praneeth Chakravarthula, Inki Kim, Eunbyung Park
TL;DR
This work tackles the challenge of spatially varying aberrations in metalens imaging by introducing a Vision Transformer-based restoration framework. It couples a Multiple Adaptive Filters Guidance (MAFG) module, which generates diverse Wiener-filtered representations, with a Spatial and Transposed self-Attention Fusion (STAF) module to jointly exploit spatial and channel-wise attention for encoder–decoder restoration. The approach achieves state-of-the-art restoration across image, video, and 3D reconstruction tasks, and its practicality is corroborated by fabricating a metalens and restoring images captured with the device. The proposed method significantly advances high-fidelity metalens imaging and offers practical pathways for robust, real-world applications.
Abstract
Metalens is an emerging optical system with an irreplaceable merit in that it can be manufactured in ultra-thin and compact sizes, which shows great promise in various applications. Despite its advantage in miniaturization, its practicality is constrained by spatially varying aberrations and distortions, which significantly degrade the image quality. Several previous arts have attempted to address different types of aberrations, yet most of them are mainly designed for the traditional bulky lens and ineffective to remedy harsh aberrations of the metalens. While there have existed aberration correction methods specifically for metalens, they still fall short of restoration quality. In this work, we propose a novel aberration correction framework for metalens-captured images, harnessing Vision Transformers (ViT) that have the potential to restore metalens images with non-uniform aberrations. Specifically, we devise a Multiple Adaptive Filters Guidance (MAFG), where multiple Wiener filters enrich the degraded input images with various noise-detail balances and a cross-attention module reweights the features considering the different degrees of aberrations. In addition, we introduce a Spatial and Transposed self-Attention Fusion (STAF) module, which aggregates features from spatial self-attention and transposed self-attention modules to further ameliorate aberration correction. We conduct extensive experiments, including correcting aberrated images and videos, and clean 3D reconstruction. The proposed method outperforms the previous arts by a significant margin. We further fabricate a metalens and verify the practicality of our method by restoring the images captured with the manufactured metalens. Code and pre-trained models are available at https://benhenryl.github.io/Metalens-Transformer.
