Table of Contents
Fetching ...

The Backpropagation of the Wave Network

Xin Zhang, Victor S. Sheng

TL;DR

The paper introduces Token2Wave, a wave-inspired complex-vector token representation that encodes global semantics via magnitude and local semantics via phase, enabling wave-like interference and modulation during updates. It analyzes convergence, backpropagation, embedding independency, and computational efficiency, showing substantial VRAM and time savings relative to BERT while preserving competitive accuracy. The work provides detailed complexity and parameter comparisons, demonstrating that a single-layer Wave network can achieve fast convergence from random embeddings with far fewer parameters than BERT base. These findings suggest practical advantages for resource-constrained NLP deployments, including mobile and edge devices, while offering new insights into global-versus-local semantic interactions in token representations.

Abstract

This paper provides an in-depth analysis of Wave Network, a novel token representation method derived from the Wave Network, designed to capture both global and local semantics of input text through wave-inspired complex vectors. In complex vector token representation, each token is represented with a magnitude component, capturing the global semantics of the entire input text, and a phase component, encoding the relationships between individual tokens and the global semantics. Building on prior research that demonstrated the effectiveness of wave-like operations, such as interference and modulation, during forward propagation, this study investigates the convergence behavior, backpropagation characteristics, and embedding independence within the Token2Wave framework. A detailed computational complexity analysis shows that Token2Wave can significantly reduce video memory usage and training time compared to BERT. Gradient comparisons for the [CLS] token, total input text, and classifier parameters further highlight Token2Wave's unique characteristics. This research offers new insights into wave-based token representations, demonstrating their potential to enable efficient and computationally friendly language model architectures.

The Backpropagation of the Wave Network

TL;DR

The paper introduces Token2Wave, a wave-inspired complex-vector token representation that encodes global semantics via magnitude and local semantics via phase, enabling wave-like interference and modulation during updates. It analyzes convergence, backpropagation, embedding independency, and computational efficiency, showing substantial VRAM and time savings relative to BERT while preserving competitive accuracy. The work provides detailed complexity and parameter comparisons, demonstrating that a single-layer Wave network can achieve fast convergence from random embeddings with far fewer parameters than BERT base. These findings suggest practical advantages for resource-constrained NLP deployments, including mobile and edge devices, while offering new insights into global-versus-local semantic interactions in token representations.

Abstract

This paper provides an in-depth analysis of Wave Network, a novel token representation method derived from the Wave Network, designed to capture both global and local semantics of input text through wave-inspired complex vectors. In complex vector token representation, each token is represented with a magnitude component, capturing the global semantics of the entire input text, and a phase component, encoding the relationships between individual tokens and the global semantics. Building on prior research that demonstrated the effectiveness of wave-like operations, such as interference and modulation, during forward propagation, this study investigates the convergence behavior, backpropagation characteristics, and embedding independence within the Token2Wave framework. A detailed computational complexity analysis shows that Token2Wave can significantly reduce video memory usage and training time compared to BERT. Gradient comparisons for the [CLS] token, total input text, and classifier parameters further highlight Token2Wave's unique characteristics. This research offers new insights into wave-based token representations, demonstrating their potential to enable efficient and computationally friendly language model architectures.

Paper Structure

This paper contains 19 sections, 8 equations, 8 figures, 5 tables.

Figures (8)

  • Figure 1: Create complex vector token representations from token embeddings
  • Figure 2: Quick Convergence on AG News
  • Figure 3: Compare the gradients of [CLS] embedding between the Wave network and Transformer
  • Figure 4: Construct wave representation by complex number
  • Figure 5: Compare the overall L2 norm of gradient tensor of word embeddings between wave network and the Transformer
  • ...and 3 more figures