Table of Contents
Fetching ...

iFairy: the First 2-bit Complex LLM with All Parameters in $\{\pm1, \pm i\}$

Feiyu Wang, Guoan Wang, Yihao Zhang, Shengfan Wang, Weitao Li, Bokai Huang, Shimao Chen, Zihan Jiang, Rui Xu, Tong Yang

TL;DR

This work introduces iFairy, the first 2-bit complex-valued LLM that maps weights to the fourth roots of unity {±1, ±i} using PhaseQuant, enabling addition-only inference while raising the full-precision accuracy ceiling. By extending Transformer components into the complex domain (dual-channel embeddings, complex self-attention, and complex RoPE) and employing a 2-bit complex weight quantizer, iFairy achieves superior perplexity and downstream task performance relative to existing 2-bit baselines, approaching FP16 baselines. Comprehensive experiments across 700M and 1.3B parameter scales demonstrate improved training dynamics, language modeling, and transfer, with ablations confirming the value of the native complex-valued architecture and a fully complex-aware computation pattern. The work also analyzes weight distributions, norms, and embedding/LM-head structures, showing balanced codebook usage and stable magnitudes, and discusses limitations and future hardware-aware optimizations for practical deployment.

Abstract

Quantization-Aware Training (QAT) integrates quantization into the training loop, enabling LLMs to learn robust low-bit representations, and is widely recognized as one of the most promising research directions. All current QAT research focuses on minimizing quantization error on full-precision models, where the full-precision accuracy acts as an upper bound (accuracy ceiling). No existing method has even attempted to surpass this ceiling. To break this ceiling, we propose a new paradigm: raising the ceiling (full-precision model), and then still quantizing it efficiently into 2 bits. We propose Fairy$\pm i$, the first 2-bit quantization framework for complex-valued LLMs. Specifically, our method leverages the representational advantages of the complex domain to boost full-precision accuracy. We map weights to the fourth roots of unity $\{\pm1, \pm i\}$, forming a perfectly symmetric and information-theoretically optimal 2-bit representation. Importantly, each quantized weight has either a zero real or imaginary part, enabling multiplication-free inference using only additions and element swaps. Experimental results show that Fairy$\pm i$ outperforms the ceiling of existing 2-bit quantization approaches in terms of both PPL and downstream tasks, while maintaining strict storage and compute efficiency. This work opens a new direction for building highly accurate and practical LLMs under extremely low-bit constraints.

iFairy: the First 2-bit Complex LLM with All Parameters in $\{\pm1, \pm i\}$

TL;DR

This work introduces iFairy, the first 2-bit complex-valued LLM that maps weights to the fourth roots of unity {±1, ±i} using PhaseQuant, enabling addition-only inference while raising the full-precision accuracy ceiling. By extending Transformer components into the complex domain (dual-channel embeddings, complex self-attention, and complex RoPE) and employing a 2-bit complex weight quantizer, iFairy achieves superior perplexity and downstream task performance relative to existing 2-bit baselines, approaching FP16 baselines. Comprehensive experiments across 700M and 1.3B parameter scales demonstrate improved training dynamics, language modeling, and transfer, with ablations confirming the value of the native complex-valued architecture and a fully complex-aware computation pattern. The work also analyzes weight distributions, norms, and embedding/LM-head structures, showing balanced codebook usage and stable magnitudes, and discusses limitations and future hardware-aware optimizations for practical deployment.

Abstract

Quantization-Aware Training (QAT) integrates quantization into the training loop, enabling LLMs to learn robust low-bit representations, and is widely recognized as one of the most promising research directions. All current QAT research focuses on minimizing quantization error on full-precision models, where the full-precision accuracy acts as an upper bound (accuracy ceiling). No existing method has even attempted to surpass this ceiling. To break this ceiling, we propose a new paradigm: raising the ceiling (full-precision model), and then still quantizing it efficiently into 2 bits. We propose Fairy, the first 2-bit quantization framework for complex-valued LLMs. Specifically, our method leverages the representational advantages of the complex domain to boost full-precision accuracy. We map weights to the fourth roots of unity , forming a perfectly symmetric and information-theoretically optimal 2-bit representation. Importantly, each quantized weight has either a zero real or imaginary part, enabling multiplication-free inference using only additions and element swaps. Experimental results show that Fairy outperforms the ceiling of existing 2-bit quantization approaches in terms of both PPL and downstream tasks, while maintaining strict storage and compute efficiency. This work opens a new direction for building highly accurate and practical LLMs under extremely low-bit constraints.

Paper Structure

This paper contains 53 sections, 25 equations, 9 figures, 5 tables, 2 algorithms.

Figures (9)

  • Figure 1: Overview of PhaseQuant and iFairy. The left panel illustrates the quantization process of PhaseQuant. In the right panel, PhaseQuant is applied to all major linear projections within iFairy, including $\mathbf{W}_\mathbf{Q}$, $\mathbf{W}_\mathbf{K}$, $\mathbf{W}_\mathbf{V}$, and $\mathbf{W}_\mathbf{O}$ in the self-attention block, as well as $\mathbf{W}_\text{Up}$, $\mathbf{W}_\text{Gate}$, and $\mathbf{W}_\text{Down}$ in the feed-forward network.
  • Figure 2: The complex-valued Transformer architecture.
  • Figure 3: Training loss comparision between iFairy and BitNet b1.58.
  • Figure 4: Training loss comparison among iFairy, full-precision iFairy and the strawman solution with simple computational pattern. We use the full-precision iFairy as the baseline of loss difference.
  • Figure 5: Quantization statistics of weight values in iFairy.
  • ...and 4 more figures