Table of Contents
Fetching ...

Seeing Clearly and Deeply: An RGBD Imaging Approach with a Bio-inspired Monocentric Design

Zongxi Yu, Xiaolong Qian, Shaohua Gao, Qi Jiang, Yao Gao, Kailun Yang, Kaiwei Wang

TL;DR

This work tackles compact RGBD imaging by introducing a bio-inspired all-spherical monocentric lens that inherently encodes depth into depth-varying PSFs. It couples a physically-based forward model with a dual-head reconstruction network to jointly recover all-in-focus imagery and depth from a single coded capture, achieving state-of-the-art depth (Abs Rel $0.026$, RMSE $0.130$) and image quality (SSIM $0.960$, LPIPS $0.082$). The BMI framework demonstrates robust performance across indoor benchmarks and shows zero-shot generalization to underwater scenes, highlighting practical applicability for robotics and AR/VR. The work argues that tightly integrated optics and computation can outperform purely algorithmic approaches and sets a path for field-of-view expansion, deeper optical priors, and physics-informed reconstruction in future research.

Abstract

Achieving high-fidelity, compact RGBD imaging presents a dual challenge: conventional compact optics struggle with RGB sharpness across the entire depth-of-field, while software-only Monocular Depth Estimation (MDE) is an ill-posed problem reliant on unreliable semantic priors. While deep optics with elements like DOEs can encode depth, they introduce trade-offs in fabrication complexity and chromatic aberrations, compromising simplicity. To address this, we first introduce a novel bio-inspired all-spherical monocentric lens, around which we build the Bionic Monocentric Imaging (BMI) framework, a holistic co-design. This optical design naturally encodes depth into its depth-varying Point Spread Functions (PSFs) without requiring complex diffractive or freeform elements. We establish a rigorous physically-based forward model to generate a synthetic dataset by precisely simulating the optical degradation process. This simulation pipeline is co-designed with a dual-head, multi-scale reconstruction network that employs a shared encoder to jointly recover a high-fidelity All-in-Focus (AiF) image and a precise depth map from a single coded capture. Extensive experiments validate the state-of-the-art performance of the proposed framework. In depth estimation, the method attains an Abs Rel of 0.026 and an RMSE of 0.130, markedly outperforming leading software-only approaches and other deep optics systems. For image restoration, the system achieves an SSIM of 0.960 and a perceptual LPIPS score of 0.082, thereby confirming a superior balance between image fidelity and depth accuracy. This study illustrates that the integration of bio-inspired, fully spherical optics with a joint reconstruction algorithm constitutes an effective strategy for addressing the intrinsic challenges in high-performance compact RGBD imaging. Source code will be publicly available at https://github.com/ZongxiYu-ZJU/BMI.

Seeing Clearly and Deeply: An RGBD Imaging Approach with a Bio-inspired Monocentric Design

TL;DR

This work tackles compact RGBD imaging by introducing a bio-inspired all-spherical monocentric lens that inherently encodes depth into depth-varying PSFs. It couples a physically-based forward model with a dual-head reconstruction network to jointly recover all-in-focus imagery and depth from a single coded capture, achieving state-of-the-art depth (Abs Rel , RMSE ) and image quality (SSIM , LPIPS ). The BMI framework demonstrates robust performance across indoor benchmarks and shows zero-shot generalization to underwater scenes, highlighting practical applicability for robotics and AR/VR. The work argues that tightly integrated optics and computation can outperform purely algorithmic approaches and sets a path for field-of-view expansion, deeper optical priors, and physics-informed reconstruction in future research.

Abstract

Achieving high-fidelity, compact RGBD imaging presents a dual challenge: conventional compact optics struggle with RGB sharpness across the entire depth-of-field, while software-only Monocular Depth Estimation (MDE) is an ill-posed problem reliant on unreliable semantic priors. While deep optics with elements like DOEs can encode depth, they introduce trade-offs in fabrication complexity and chromatic aberrations, compromising simplicity. To address this, we first introduce a novel bio-inspired all-spherical monocentric lens, around which we build the Bionic Monocentric Imaging (BMI) framework, a holistic co-design. This optical design naturally encodes depth into its depth-varying Point Spread Functions (PSFs) without requiring complex diffractive or freeform elements. We establish a rigorous physically-based forward model to generate a synthetic dataset by precisely simulating the optical degradation process. This simulation pipeline is co-designed with a dual-head, multi-scale reconstruction network that employs a shared encoder to jointly recover a high-fidelity All-in-Focus (AiF) image and a precise depth map from a single coded capture. Extensive experiments validate the state-of-the-art performance of the proposed framework. In depth estimation, the method attains an Abs Rel of 0.026 and an RMSE of 0.130, markedly outperforming leading software-only approaches and other deep optics systems. For image restoration, the system achieves an SSIM of 0.960 and a perceptual LPIPS score of 0.082, thereby confirming a superior balance between image fidelity and depth accuracy. This study illustrates that the integration of bio-inspired, fully spherical optics with a joint reconstruction algorithm constitutes an effective strategy for addressing the intrinsic challenges in high-performance compact RGBD imaging. Source code will be publicly available at https://github.com/ZongxiYu-ZJU/BMI.

Paper Structure

This paper contains 19 sections, 7 equations, 9 figures, 6 tables.

Figures (9)

  • Figure 1: Overview of the proposed Bionic Monocentric Imaging (BMI) framework. Our method consists of three main stages. (a) Bionic Optical Design: Inspired by the Cichlid Eye, we design a bio-inspired monocentric fisheye lens. The resulting Modulation Transfer Function (MTF) and depth-dependent Point Spread Functions (PSFs) are characterized. (b) Depth-aware Image Simulation: We build a physically-based forward model that uses the characterized PSFs to transform a ground truth (GT) image and its corresponding depth map into a coded image, simulating the degradation introduced by our lens. (c) Joint Image Restoration and Depth Estimation: A two-head reconstruction network takes the coded image as input and is trained to jointly recover a clear, restored image and its corresponding depth map.
  • Figure 2: The architecture of our reconstruction network for joint image restoration and depth estimation. The network utilizes a shared encoder to extract unified features from multi-scale input. These features are then fed into two separate decoder heads—one for multi-scale image restoration and the other for depth estimation—enabling the joint recovery of both tasks.
  • Figure 3: Simulated PSFs of bio-inspired lens. The PSFs are shown for three different fields of view (rows: $0^\circ$, $3^\circ$, $6^\circ$) and ten object depths (columns: $0.8m$ to $10.0m$). Each PSF is visualized from a $128{\times}128$ data array. For better visualization, the intensity of each PSF has been normalized.
  • Figure 4: Qualitative comparison of our method against other approaches on the NYU Depth V2 dataset, such as CF-DOE zhuge2024calibration and Metric3D Small hu2024metric3d. The RMSEs of depth maps or the PSNRs of images compared with GTs are noted in the upper right corner. Our method produces depth maps with fewer artifacts and restored images with higher clarity and fidelity.
  • Figure 5: Enlarged qualitative comparison for image restoration. The magnified regions, indicated by red boxes, compare our method with the DOE-based approach and the Ground Truth (GT).
  • ...and 4 more figures