Table of Contents
Fetching ...

All-Optical Segmentation via Diffractive Neural Networks for Autonomous Driving

Yingjie Li, Daniel Robinson, Cunxi Yu

TL;DR

This work tackles the energy and latency challenges of perception in autonomous driving by introducing all-optical RGB image processing via free-space Diffractive Optical Neural Networks (DONNs). The authors design a three-channel DONN architecture with optical skip connections to perform semantic segmentation and lane detection, and validate the approach on CityScapes, indoor-track lane data, and CARLA simulations, including generalization tests across lighting and maps. Key contributions include a differentiable numerical model for RGB DONNs, a training framework that optimizes phase-modulation weights, and extensive experiments showing competitive segmentation performance (IoU ≈ 0.71 on CityScapes with 12 layers) and robust lane-detection (IoU ≈ 0.80 indoors, with generalization in CARLA). The results demonstrate the practicality and potential of all-optical processing for real-time perception in autonomous driving, while also outlining hardware and binarization challenges that need to be addressed for deployment.

Abstract

Semantic segmentation and lane detection are crucial tasks in autonomous driving systems. Conventional approaches predominantly rely on deep neural networks (DNNs), which incur high energy costs due to extensive analog-to-digital conversions and large-scale image computations required for low-latency, real-time responses. Diffractive optical neural networks (DONNs) have shown promising advantages over conventional DNNs on digital or optoelectronic computing platforms in energy efficiency. By performing all-optical image processing via light diffraction at the speed of light, DONNs save computation energy costs while reducing the overhead associated with analog-to-digital conversions by all-optical encoding and computing. In this work, we propose a novel all-optical computing framework for RGB image segmentation and lane detection in autonomous driving applications. Our experimental results demonstrate the effectiveness of the DONN system for image segmentation on the CityScapes dataset. Additionally, we conduct case studies on lane detection using a customized indoor track dataset and simulated driving scenarios in CARLA, where we further evaluate the model's generalizability under diverse environmental conditions.

All-Optical Segmentation via Diffractive Neural Networks for Autonomous Driving

TL;DR

This work tackles the energy and latency challenges of perception in autonomous driving by introducing all-optical RGB image processing via free-space Diffractive Optical Neural Networks (DONNs). The authors design a three-channel DONN architecture with optical skip connections to perform semantic segmentation and lane detection, and validate the approach on CityScapes, indoor-track lane data, and CARLA simulations, including generalization tests across lighting and maps. Key contributions include a differentiable numerical model for RGB DONNs, a training framework that optimizes phase-modulation weights, and extensive experiments showing competitive segmentation performance (IoU ≈ 0.71 on CityScapes with 12 layers) and robust lane-detection (IoU ≈ 0.80 indoors, with generalization in CARLA). The results demonstrate the practicality and potential of all-optical processing for real-time perception in autonomous driving, while also outlining hardware and binarization challenges that need to be addressed for deployment.

Abstract

Semantic segmentation and lane detection are crucial tasks in autonomous driving systems. Conventional approaches predominantly rely on deep neural networks (DNNs), which incur high energy costs due to extensive analog-to-digital conversions and large-scale image computations required for low-latency, real-time responses. Diffractive optical neural networks (DONNs) have shown promising advantages over conventional DNNs on digital or optoelectronic computing platforms in energy efficiency. By performing all-optical image processing via light diffraction at the speed of light, DONNs save computation energy costs while reducing the overhead associated with analog-to-digital conversions by all-optical encoding and computing. In this work, we propose a novel all-optical computing framework for RGB image segmentation and lane detection in autonomous driving applications. Our experimental results demonstrate the effectiveness of the DONN system for image segmentation on the CityScapes dataset. Additionally, we conduct case studies on lane detection using a customized indoor track dataset and simulated driving scenarios in CARLA, where we further evaluate the model's generalizability under diverse environmental conditions.
Paper Structure (27 sections, 11 equations, 13 figures, 2 tables)

This paper contains 27 sections, 11 equations, 13 figures, 2 tables.

Figures (13)

  • Figure 1: The basic DONN framework. The upper part illustrates the extracted model for a DONN system including the input image, three diffractive layers, and the system output for the image segmentation task. The lower part shows the hardware deployment for the all-optical inference with a trained DONN system. It includes a laser source for input image encoding and computation, three SLMs for diffractive layers, and a camera to capture the system output.
  • Figure 2: Illustration of the DONN system designed for RGB image processing. It mainly includes three parts: (1) Coherent laser source emitted the light signal. (2) Information coding, where the light information ('R', 'G', 'B' components) captured by the front passive optical system in the RGB camera is used to generate the encoding layers for three separate DONN channels. When the light signal propagates through the encoding layer, the corresponding information will be encoded on the light signal. (3) Image processing computation with trained diffractive layers. We can implement optical skip connections for deep DONN systems.
  • Figure 3: The input image and the ground truth image from CityScapes.
  • Figure 4: The input image and the ground truth image from customized indoor track.
  • Figure 5: The input image and the ground truth image from simulations in CARLA.
  • ...and 8 more figures