Table of Contents
Fetching ...

CAWN: Continuous Acoustic Wave Networks for Autoregressive Language Modeling

Dejan Čugalj, Aleksandar Jevremovic

Abstract

Modern Large Language Models (LLMs) rely on Transformer self-attention, which scales quadratically with sequence length. Recent linear-time alternatives, like State Space Models (SSMs), often suffer from signal degradation over extended contexts. We introduce the Continuous Acoustic Wave Network (CAWN), a fully continuous sequence-mixing architecture. Instead of discrete matrix-based attention, CAWN projects hidden states into multi-headed complex-domain phasors, achieving sequence mixing through a causal, $O(L)$ Phase Accumulation mechanism. To prevent signal degradation over ultra-long contexts, we introduce a dual-gated Selective Phase Resonance mechanism incorporating Frequency-Dependent Retention, Hard-Threshold Gating via Straight-Through Estimation, and a Temporal Syntax Cache to capture short-term local dependencies. We also replace standard dense linear projections with Depth-wise Harmonic Convolutions for optimal spatial frequency mixing, augmented by Block Attention Residuals for depth-wise state routing. Scaled to a 150M-parameter model, CAWN utilizes custom Triton kernels for hardware-efficient, true-complex phase accumulation in float32. Trained via a continuous streaming loop on a 100-Billion-token corpus, the prototype is evaluated at a 5-Billion-token milestone. Empirical evaluations via a Targeted Semantic Retrieval protocol demonstrate robust vocabulary acquisition and extended explicitly learned contextual denoising. By leveraging $O(1)$ state-passing via chunked prefill, the model retrieves targeted information across 2,000,000 tokens while strictly plateauing at 8.72 GB of Peak VRAM, empirically overcoming the $O(L^2)$ context memory wall.

CAWN: Continuous Acoustic Wave Networks for Autoregressive Language Modeling

Abstract

Modern Large Language Models (LLMs) rely on Transformer self-attention, which scales quadratically with sequence length. Recent linear-time alternatives, like State Space Models (SSMs), often suffer from signal degradation over extended contexts. We introduce the Continuous Acoustic Wave Network (CAWN), a fully continuous sequence-mixing architecture. Instead of discrete matrix-based attention, CAWN projects hidden states into multi-headed complex-domain phasors, achieving sequence mixing through a causal, Phase Accumulation mechanism. To prevent signal degradation over ultra-long contexts, we introduce a dual-gated Selective Phase Resonance mechanism incorporating Frequency-Dependent Retention, Hard-Threshold Gating via Straight-Through Estimation, and a Temporal Syntax Cache to capture short-term local dependencies. We also replace standard dense linear projections with Depth-wise Harmonic Convolutions for optimal spatial frequency mixing, augmented by Block Attention Residuals for depth-wise state routing. Scaled to a 150M-parameter model, CAWN utilizes custom Triton kernels for hardware-efficient, true-complex phase accumulation in float32. Trained via a continuous streaming loop on a 100-Billion-token corpus, the prototype is evaluated at a 5-Billion-token milestone. Empirical evaluations via a Targeted Semantic Retrieval protocol demonstrate robust vocabulary acquisition and extended explicitly learned contextual denoising. By leveraging state-passing via chunked prefill, the model retrieves targeted information across 2,000,000 tokens while strictly plateauing at 8.72 GB of Peak VRAM, empirically overcoming the context memory wall.

Paper Structure

This paper contains 25 sections, 11 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: CAWN-150M architecture. Quadratic attention is supplanted by complex-domain phasor synthesis with Temporal Cache and Depth-wise Harmonic Convolution, alongside GELU hendrycks2016gaussian feed-forward layers. Residual connections are restricted to block-local scope, with historical states retrieved via depth-wise attention residuals kimiteam2026attentionresiduals.
  • Figure 2: Empirical VRAM scaling trajectory for the CAWN-150M architecture evaluated on an NVIDIA H100. Extended bounds testing demonstrates predictable linear growth until the 32k chunk limit, after which the $\mathcal{O}(1)$ phase caching mechanism strictly caps peak VRAM at 8.72 GB. This allows the model to continuously process up to 2 Million tokens without memory degradation or expansion.
  • Figure 3: Validation perplexity descent of the CAWN-150M prototype compared to $\mathcal{O}(L^2)$ baselines. CAWN demonstrates strict, monotonic convergence, successfully crossing below the parameter-matched standard Transformer baseline (Pythia-160M) as training progresses (where 800k micro-batch steps equates to $\sim$5.7 Billion tokens).