Application of Transformers for Nonlinear Channel Compensation in Optical Systems

Behnam Behinaein Hamgini; Hossein Najafi; Ali Bakhshali; Zhuhong Zhang

Application of Transformers for Nonlinear Channel Compensation in Optical Systems

Behnam Behinaein Hamgini, Hossein Najafi, Ali Bakhshali, Zhuhong Zhang

TL;DR

The paper tackles nonlinear impairments in high-speed coherent optical links by introducing a Transformer-based encoder-only nonlinear equalizer (NLC). By designing embeddings, a physics-informed attention mask, and block-processing, the approach achieves parallelizable, memory-rich nonlinear compensation with competitive or superior performance to DBP and LSTM baselines across 16QAM and 64QAM scenarios. Key contributions include a detailed Transformer-NLC architecture, a perturbation-theory-driven masking strategy to reduce attention complexity, and extensive hyper-parameter analysis demonstrating robust performance-complexity trade-offs. The results suggest a flexible, scalable alternative for optical networks that can adapt to varying symbol rates and PMD conditions, with practical potential for hardware-friendly deployment.

Abstract

In this paper, we introduce a new nonlinear optical channel equalizer based on Transformers. By leveraging parallel computation and attending directly to the memory across a sequence of symbols, we show that Transformers can be used effectively for nonlinear compensation (NLC) in coherent long-haul transmission systems. For this application, we present an implementation of the encoder part of the Transformer and analyze its performance over a wide range of different hyper-parameters. It is shown that by proper embeddings and processing blocks of symbols at each iteration and also carefully selecting subsets of the encoder's output to be processed together, an efficient nonlinear equalization can be achieved for different complexity constraints. To reduce the computational complexity of the attention mechanism, we further propose the use of a physic-informed mask inspired by nonlinear perturbation theory. We also compare the Transformer-NLC with digital back-propagation (DBP) under different transmission scenarios in order to demonstrate the flexibility and generalizability of the proposed data-driven solution.

Application of Transformers for Nonlinear Channel Compensation in Optical Systems

TL;DR

Abstract

Paper Structure (28 sections, 11 equations, 20 figures, 6 tables, 2 algorithms)

This paper contains 28 sections, 11 equations, 20 figures, 6 tables, 2 algorithms.

Introduction
Preliminaries
Nonlinear Compensation in Coherent Optical Systems
Transformers
Transformers for Nonlinear Compensation
Model Architecture
Embedding
Mask
Using Neighbors at the Output Layer of Transformer
Numerical Results
System Model
Numerical Setup
Performance vs. Complexity Trade-off for 16QAM Setup
Performance vs. Complexity Trade-off for 64QAM Setup
Comparison to an LSTM-based equalizer
...and 13 more sections

Figures (20)

Figure 1: An example of receiver symbol-domain nonlinear compensation with neural network.
Figure 2: Transformer encoder architecture.
Figure 3: The overall architecture for the nonlinear channel equalization. From left to right: the CNN generates input embeddings which are processed by the Transformer to generate output representations. These outputs are then fed into an MLP to generate the estimated nonlinear distortions $E_{XI}$ and $E_{XQ}$.
Figure 4: Attention masks for the cases where output error only computed for the individual target symbol. The red area shows the values that are zeros (unmasked) and the blue area shows the values which are negative infinity (masked). (a) $t=32$ and $\rho=2.6$, (b) $t=64$ and $\rho=2.6$, (c) $t=64$ and $\rho=1.3$, and (d) $t=64$ and $\rho=0.4$. The ratio of number of unmasked (zero) elements to the total number of elements is 0.31, 0.19, 0.10 and 0.04 for figure (a), (b), (c), and (d), respectively.
Figure 5: Masks for blocks: (a) shows a block mask with $t=64, b=128, \rho=2.6$ while (b) depicts a block mask with $t=64, b=128, \rho=0.4$. (c) shows the mask with $t=64, b=4096, \rho=2.6$. The blue area shows the elements with negative infinity values and the red shows zero elements. (d) shows the ratio of number of unmasked (zero) elements to the total number of elements versus block size for several values of $\rho$s.
...and 15 more figures

Application of Transformers for Nonlinear Channel Compensation in Optical Systems

TL;DR

Abstract

Application of Transformers for Nonlinear Channel Compensation in Optical Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (20)