Application of Transformers for Nonlinear Channel Compensation in Optical Systems
Behnam Behinaein Hamgini, Hossein Najafi, Ali Bakhshali, Zhuhong Zhang
TL;DR
The paper tackles nonlinear impairments in high-speed coherent optical links by introducing a Transformer-based encoder-only nonlinear equalizer (NLC). By designing embeddings, a physics-informed attention mask, and block-processing, the approach achieves parallelizable, memory-rich nonlinear compensation with competitive or superior performance to DBP and LSTM baselines across 16QAM and 64QAM scenarios. Key contributions include a detailed Transformer-NLC architecture, a perturbation-theory-driven masking strategy to reduce attention complexity, and extensive hyper-parameter analysis demonstrating robust performance-complexity trade-offs. The results suggest a flexible, scalable alternative for optical networks that can adapt to varying symbol rates and PMD conditions, with practical potential for hardware-friendly deployment.
Abstract
In this paper, we introduce a new nonlinear optical channel equalizer based on Transformers. By leveraging parallel computation and attending directly to the memory across a sequence of symbols, we show that Transformers can be used effectively for nonlinear compensation (NLC) in coherent long-haul transmission systems. For this application, we present an implementation of the encoder part of the Transformer and analyze its performance over a wide range of different hyper-parameters. It is shown that by proper embeddings and processing blocks of symbols at each iteration and also carefully selecting subsets of the encoder's output to be processed together, an efficient nonlinear equalization can be achieved for different complexity constraints. To reduce the computational complexity of the attention mechanism, we further propose the use of a physic-informed mask inspired by nonlinear perturbation theory. We also compare the Transformer-NLC with digital back-propagation (DBP) under different transmission scenarios in order to demonstrate the flexibility and generalizability of the proposed data-driven solution.
