Comba: Improving Bilinear RNNs with Closed-loop Control

Jiaxi Hu; Yongqi Pan; Jusen Du; Disen Lan; Xiaqiang Tang; Qingsong Wen; Yuxuan Liang; Weigao Sun

Comba: Improving Bilinear RNNs with Closed-loop Control

Jiaxi Hu, Yongqi Pan, Jusen Du, Disen Lan, Xiaqiang Tang, Qingsong Wen, Yuxuan Liang, Weigao Sun

TL;DR

Comba introduces a closed-loop Bilinear RNN with scalar-plus-low-rank state transitions and output correction, blending control theory with neural memory to achieve robust, hardware-friendly chunk-wise parallel training. By leveraging WY representations and UT transforms, Comba attains faster pretraining and improved performance on both language and vision tasks across 340M and 1.3B parameter scales. The approach addresses limitations of prior Delta-based Bilinear RNNs by enabling principled memory forgetting, improved recall, and stable long-context modeling, while maintaining compatibility with hybrid architectures. Limitations include evaluation at moderate scales and partial comparisons with newer nonlinear RNNs; future work targets larger-scale benchmarking and deeper integration with hybrid attention mechanisms like GSA.

Abstract

Recent efficient sequence modeling methods such as Gated DeltaNet, TTT, and RWKV-7 have achieved performance improvements by supervising the recurrent memory management through Delta learning rule. Unlike previous state-space models (e.g., Mamba) and gated linear attentions (e.g., GLA), these models introduce interactions between the recurrent state and the key vector, structurally resembling bilinear systems. In this paper, we first introduce the concept of Bilinear RNNs with a comprehensive analysis on the advantages and limitations of these models. Then, based on closed-loop control theory, we propose a novel Bilinear RNN variant named Comba, which adopts a scalar-plus-low-rank state transition, with both state feedback and output feedback corrections. We also implement a hardware-efficient chunk-wise parallel kernel in Triton and train models with 340M/1.3B parameters on large-scale corpus. Comba demonstrates superior performance and computation efficiency in both language and vision modeling.

Comba: Improving Bilinear RNNs with Closed-loop Control

TL;DR

Abstract

Comba: Improving Bilinear RNNs with Closed-loop Control

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)