Improving Transformers using Faithful Positional Encoding
Tsuyoshi Idé, Jokin Labaien, Pin-Yu Chen
TL;DR
The paper addresses the problem that standard Transformer positional encoding (PE), based on a sinusoidal basis with frequencies $w_k$, may fail to faithfully preserve position information due to low-pass characteristics. It introduces a faithfulness notion for PE and derives a Discrete Fourier Transform (DFT) based positional encoding by encoding the one-hot position function $f_s(t)=\delta_{s,t}$ via its DFT coefficients, yielding a faithful and invertible representation. The main contributions include formalizing faithfulness, deriving the DFT PE with a flat Fourier coefficient distribution, proving its reconstructability, and demonstrating consistent improvements in time-series classification on Elevator, SMD, and MSL datasets. The findings suggest that capturing short- to mid-range positional dependencies with a principled, mathematically grounded encoding can enhance Transformer performance in sequential tasks where local position matters.
Abstract
We propose a new positional encoding method for a neural network architecture called the Transformer. Unlike the standard sinusoidal positional encoding, our approach is based on solid mathematical grounds and has a guarantee of not losing information about the positional order of the input sequence. We show that the new encoding approach systematically improves the prediction performance in the time-series classification task.
