Table of Contents
Fetching ...

Converting Transformers into DGNNs Form

Jie Zhang, Mao-Hsuan Mao, Bo-Wei Chiu, Min-Te Sun

TL;DR

The paper introduces Converter, a Transformer variant that replaces self-attention with synthetic unitary digraph convolution (Synvolution) and a Kernel Polynomial Method (Kernelution) to convert Transformers into Directed Graph Neural Networks (DGNNs). It constructs a learnable unitary digraph shift operator via a two-phase spectral synthesis (eigenvalues from SIREN-based representations and eigenvectors by inverse LQ with Givens rotations), enabling fast, linearithmic-time processing through a 1-D DHHP implementation. Kernelution leverages Chebyshev interpolation and Gibbs damping to create a data-adaptive spectral kernel, while a Gated FFN and PostScaleNorm map complex outputs to real-valued predictions. Across Long-Range Arena, long document classification, and DNA taxonomy tasks, Converter achieves superior accuracy and efficiency, demonstrating that digraph convolution can effectively emulate and compete with self-attention in large-sequence settings.

Abstract

Recent advances in deep learning have established Transformer architectures as the predominant modeling paradigm. Central to the success of Transformers is the self-attention mechanism, which scores the similarity between query and key matrices to modulate a value matrix. This operation bears striking similarities to digraph convolution, prompting an investigation into whether digraph convolution could serve as an alternative to self-attention. In this study, we formalize this concept by introducing a synthetic unitary digraph convolution based on the digraph Fourier transform. The resulting model, which we term Converter, effectively converts a Transformer into a Directed Graph Neural Network (DGNN) form. We have tested Converter on Long-Range Arena benchmark, long document classification, and DNA sequence-based taxonomy classification. Our experimental results demonstrate that Converter achieves superior performance while maintaining computational efficiency and architectural simplicity, which establishes it as a lightweight yet powerful Transformer variant.

Converting Transformers into DGNNs Form

TL;DR

The paper introduces Converter, a Transformer variant that replaces self-attention with synthetic unitary digraph convolution (Synvolution) and a Kernel Polynomial Method (Kernelution) to convert Transformers into Directed Graph Neural Networks (DGNNs). It constructs a learnable unitary digraph shift operator via a two-phase spectral synthesis (eigenvalues from SIREN-based representations and eigenvectors by inverse LQ with Givens rotations), enabling fast, linearithmic-time processing through a 1-D DHHP implementation. Kernelution leverages Chebyshev interpolation and Gibbs damping to create a data-adaptive spectral kernel, while a Gated FFN and PostScaleNorm map complex outputs to real-valued predictions. Across Long-Range Arena, long document classification, and DNA taxonomy tasks, Converter achieves superior accuracy and efficiency, demonstrating that digraph convolution can effectively emulate and compete with self-attention in large-sequence settings.

Abstract

Recent advances in deep learning have established Transformer architectures as the predominant modeling paradigm. Central to the success of Transformers is the self-attention mechanism, which scores the similarity between query and key matrices to modulate a value matrix. This operation bears striking similarities to digraph convolution, prompting an investigation into whether digraph convolution could serve as an alternative to self-attention. In this study, we formalize this concept by introducing a synthetic unitary digraph convolution based on the digraph Fourier transform. The resulting model, which we term Converter, effectively converts a Transformer into a Directed Graph Neural Network (DGNN) form. We have tested Converter on Long-Range Arena benchmark, long document classification, and DNA sequence-based taxonomy classification. Our experimental results demonstrate that Converter achieves superior performance while maintaining computational efficiency and architectural simplicity, which establishes it as a lightweight yet powerful Transformer variant.

Paper Structure

This paper contains 24 sections, 5 theorems, 15 equations, 3 figures, 8 tables, 3 algorithms.

Key Result

Proposition 2

$L$-DHHP captures the discrete unitary transforms, including discrete Fourier transform (DFT), the discrete Walsh–Hadamard transform (DWHT), the discrete cosine transform (DCT), the discrete sine transform (DST), and their inverses exactly.

Figures (3)

  • Figure 1: Converter architecture.
  • Figure 2: Illustration of the entire Kernelution process.
  • Figure 3: An illustration of Gibbs phenomenon when using the kernel polynomial method with different kernels to approximate a step function.

Theorems & Definitions (10)

  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Theorem 5: CPI for differentiable functions 10.1137/1.9781611975949
  • Theorem 6: CPI for analytic functions 10.1137/1.9781611975949
  • proof
  • proof
  • proof
  • proof
  • proof