Decision-making and control with diffractive optical networks

Jumin Qiu; Shuyuan Xiao; Lujun Huang; Andrey Miroshnichenko; Dejian Zhang; Tingting Liu; Tianbao Yu

Decision-making and control with diffractive optical networks

Jumin Qiu, Shuyuan Xiao, Lujun Huang, Andrey Miroshnichenko, Dejian Zhang, Tingting Liu, Tianbao Yu

TL;DR

The paper tackles decision-making and control directly from high-dimensional sensory inputs using diffractive optical networks (DONs). It introduces a residual phase-profile DON trained with deep reinforcement learning from self-play, where a policy $π(a|s)$ is learned and then transferred to the optical hardware through backpropagation in the forward diffraction model $F(X)$, with a residual shortcut $F(αX)+(1-α)X$. Validated on Tic-Tac-Toe, Super Mario Bros., and Car Racing, including an experimental Tic-Tac-Toe demonstration with a DMD-SLM system, showing good agreement with simulations. The results suggest a promising all-optical AI path for real-time control in autonomous driving, robotics, and manufacturing, with metasurface implementations proposed for high-density integration.

Abstract

The ultimate goal of artificial intelligence is to mimic the human brain to perform decision-making and control directly from high-dimensional sensory input. Diffractive optical networks provide a promising solution for implementing artificial intelligence with high-speed and low-power consumption. Most of the reported diffractive optical networks focus on single or multiple tasks that do not involve environmental interaction, such as object recognition and image classification. In contrast, the networks capable of performing decision-making and control have not yet been developed to our knowledge. Here, we propose using deep reinforcement learning to implement diffractive optical networks that imitate human-level decision-making and control capability. Such networks taking advantage of a residual architecture, allow for finding optimal control policies through interaction with the environment and can be readily implemented with existing optical devices. The superior performance of these networks is verified by engaging three types of classic games, Tic-Tac-Toe, Super Mario Bros., and Car Racing. Finally, we present an experimental demonstration of playing Tic-Tac-Toe by leveraging diffractive optical networks based on a spatial light modulator. Our work represents a solid step forward in advancing diffractive optical networks, which promises a fundamental shift from the target-driven control of a pre-designed state for simple recognition or classification tasks to the high-level sensory capability of artificial intelligence. It may find exciting applications in autonomous driving, intelligent robots, and intelligent manufacturing.

Decision-making and control with diffractive optical networks

TL;DR

is learned and then transferred to the optical hardware through backpropagation in the forward diffraction model

, with a residual shortcut

. Validated on Tic-Tac-Toe, Super Mario Bros., and Car Racing, including an experimental Tic-Tac-Toe demonstration with a DMD-SLM system, showing good agreement with simulations. The results suggest a promising all-optical AI path for real-time control in autonomous driving, robotics, and manufacturing, with metasurface implementations proposed for high-density integration.

Abstract

Paper Structure (12 sections, 5 figures)

This paper contains 12 sections, 5 figures.

Introduction
Results
The network for decision-making and control
Playing Tic-Tac-Toe
Playing Super Mario Bros.
Playing Car Racing
Experimental demonstration of playing Tic-Tac-Toe
Discussion
Methods
Experimental setup
Training of network
Modeling of control policy

Figures (5)

Figure 1: The DON for decision-making and control.a--c The proposed network plays the video game of Super Mario Bros. in a human-like manner. In the network architecture, an input layer captures continuous, high-dimensional game snapshots (seeing), a series of diffractive layers choose a particular action through a learned control policy for each situation faced (making a decision), and an output layer maps the intensity distribution into preset action regions to generate the control signals in the games (controlling). d Training framework of policy and network. Deep reinforcement learning through an agent interacts with a simulated environment to find a near-optimal control policy represented by a CNN, which is employed as the ground truth to update the DON by error backpropagate algorithm. e The experimental setup of DON for decision-making and control. f The building block of DON.
Figure 2: Playing Tic-Tac-Toe.a The schematic illustration of the DON composed of an input layer, hidden layers of cascaded three diffractive blocks, and an output layer for playing Tic-Tac-Toe. b,c The sequential control of the DON in performing gameplay tasks for X and O, respectively. d The accuracy rate of playing Tic-Tac-Toe. There is a collection of 87 games utilized for predicting the X, obtaining 81 wins and 6 draws in these games. In the rest of the 583 games, the O obtains 454 wins, 74 draws, and 21 losses. When previous moves have occupied the predicted position at a turn, such a case is counted as a playing error and occurs 34 times. e Dependence of the prediction accuracy on the number of hidden layers.
Figure 3: Playing Super Mario Bros.a The layout of the designed network for playing Super Mario Bros. b,c Snapshots of Mario's jumping and crouching actions by comparing the output intensities of actions. The output intensity of the jump is maximum at the 201st frame, so the predicted action is jump, and Mario is controlled to act, shown in b. A similar series of prediction and control for another crouch action can also be observed in c. d The inverse prediction result. Considering the predicted crouch at the current state is crucial for updating Mario's action, we use the maximized output intensity of the crouch as input, ignoring the simultaneous output of other actions.
Figure 4: Playing Car Racing.a The layout of the designed network for playing Car Racing. b The control of the steering direction and angle of the car with respect to the difference value between the intensities at the current state, normalized between $-$1 and 1. c--f Snapshots of controlling the car steering. When the car is facing a left-turn track in c, the output intensity on the left keeps the value greater than the right intensity, allowing continuous control in updating the rotation angle of the left-turn action. A similar control process can also be performed for the right-turn track in e. In addition, the anti-disturbance of the network is validated by introducing the Gaussian blur d and Gaussian noise f to the game images, respectively.
Figure 5: Experimental demonstration of the DON for Tic-Tac-Toe.a The photo of the experimental system, where the unlabeled devices are lenses, a spatial filter is used to remove the unwanted multiple-order energy peaks, and a filter is mounted on the camera. b The output of the first layer of the sample in Fig. \ref{['fig2']}a, and the red arrows represent the polarization direction of incident light. c,d The sequential control of the DON in playing the same two games as in Fig. \ref{['fig2']}b,c, respectively. The experimental results are normalized based on simulation results. Sim. simulation, Exp. experimental.

Decision-making and control with diffractive optical networks

TL;DR

Abstract

Decision-making and control with diffractive optical networks

Authors

TL;DR

Abstract

Table of Contents

Figures (5)