Table of Contents
Fetching ...

Recurrent neural networks implemented through spatiotemporal light propagation in optical fibers

Dilem Eşlik, Bahadır Utku Kesgin, Uğur Teğin

TL;DR

The results show that recurrent temporal processing can emerge directly from spatiotemporal wave dynamics, and offers an energy-efficient pathway to temporal artificial intelligence by leveraging intrinsic spatiotemporal optical nonlinearities within multimode fibers.

Abstract

Recurrent neural networks excel at temporal tasks and video processing but require energy-intensive sequential memory operations. We demonstrate that multimode optical fibers naturally implement spatiotemporal recurrent computation through passive light propagation. Video frames are encoded onto separate optical beams with controlled time delays; these beams combine and recirculate through a fiber loop where interference and nonlinear propagation generate high-dimensional states encoding both current inputs and fading memory. Remarkably, the entire optical system remains fixed with no trainable parameters or electronic feedback, yet this single physical configuration achieves competitive performance across diverse temporal and spatiotemporal learning tasks: chaotic time-series forecasting, human action recognition, steering angle prediction, and surgical skill assessment. Our results show that recurrent temporal processing can emerge directly from spatiotemporal wave dynamics. This paradigm shift from algorithmic to physical recurrence offers an energy-efficient pathway to temporal artificial intelligence by leveraging intrinsic spatiotemporal optical nonlinearities within multimode fibers.

Recurrent neural networks implemented through spatiotemporal light propagation in optical fibers

TL;DR

The results show that recurrent temporal processing can emerge directly from spatiotemporal wave dynamics, and offers an energy-efficient pathway to temporal artificial intelligence by leveraging intrinsic spatiotemporal optical nonlinearities within multimode fibers.

Abstract

Recurrent neural networks excel at temporal tasks and video processing but require energy-intensive sequential memory operations. We demonstrate that multimode optical fibers naturally implement spatiotemporal recurrent computation through passive light propagation. Video frames are encoded onto separate optical beams with controlled time delays; these beams combine and recirculate through a fiber loop where interference and nonlinear propagation generate high-dimensional states encoding both current inputs and fading memory. Remarkably, the entire optical system remains fixed with no trainable parameters or electronic feedback, yet this single physical configuration achieves competitive performance across diverse temporal and spatiotemporal learning tasks: chaotic time-series forecasting, human action recognition, steering angle prediction, and surgical skill assessment. Our results show that recurrent temporal processing can emerge directly from spatiotemporal wave dynamics. This paradigm shift from algorithmic to physical recurrence offers an energy-efficient pathway to temporal artificial intelligence by leveraging intrinsic spatiotemporal optical nonlinearities within multimode fibers.
Paper Structure (4 sections, 8 equations, 5 figures, 1 table)

This paper contains 4 sections, 8 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Experimental setup and physical mechanism of recurrent optical processing. (A) Experimental schematic showing the passive recurrent architecture. Sequential video frames are spatially multiplexed onto a spatial light modulator (SLM), coupled into multimode fibers with different delay lengths, combined, and fed into a 50/50 fiber coupler loop. Half the signal recirculates while half is detected as speckle patterns. (B) Schematic illustrating the pulse propagation and memory mechanism within the memory loop and processing fiber after the delay unit.
  • Figure 2: Chaotic time-series processing. (A) One-step-ahead prediction test set results on the Santa Fe laser dataset. (B) Autonomous forecasting results where the system recursively predicts future states without access to ground truth.
  • Figure 3: Human action recognition from video. (A) Representative frames from the KTH dataset showing six action categories. (B) Confusion matrix of full-dataset action classification achieving 96.67% accuracy. (C) Confusion matrix of scene-specific classification where only action varies. (D) Confusion matrix of scene classification using the jogging action. (E) Confusion matrix of person identification within the running action category.
  • Figure 4: Steering angle prediction for autonomous driving. (A) Representative dashboard-camera frames from different driving scenarios. (B) Continuous steering-angle regression showing close alignment between predicted and ground-truth steering commands. (C) Confusion matrix for five-class steering direction classification.
  • Figure 5: Surgical skill recognition from video. (A) Representative frames from surgical task videos with performance metrics. (B) Confusion matrix of overall skill classification. (C) Confusion matrix of respect-for-tissue metric. (D) Confusion matrix of output-quality classification.