Reparameterized Multi-Resolution Convolutions for Long Sequence Modelling
Harry Jake Cunningham, Giorgio Giannone, Mingtian Zhang, Marc Peter Deisenroth
TL;DR
The paper tackles the challenge of modelling extremely long sequences by introducing MRConv, a parameter-efficient, reparameterized multi-resolution convolution framework. MRConv builds long global kernels as learnable sums of low-rank sub-kernels across multiple resolutions, trained in parallel via causal structural reparameterization and merged into a single kernel for inference. It offers three kernel parameterizations—dilated, Fourier, and sparse—along with FFT-based Convolutions to maintain efficiency. Across Long Range Arena, sCIFAR, Speech Commands, and ImageNet, MRConv achieves state-of-the-art results among convolutional models and linear-time transformers while improving efficiency, validating its applicability across diverse modalities.
Abstract
Global convolutions have shown increasing promise as powerful general-purpose sequence models. However, training long convolutions is challenging, and kernel parameterizations must be able to learn long-range dependencies without overfitting. This work introduces reparameterized multi-resolution convolutions ($\texttt{MRConv}$), a novel approach to parameterizing global convolutional kernels for long-sequence modelling. By leveraging multi-resolution convolutions, incorporating structural reparameterization and introducing learnable kernel decay, $\texttt{MRConv}$ learns expressive long-range kernels that perform well across various data modalities. Our experiments demonstrate state-of-the-art performance on the Long Range Arena, Sequential CIFAR, and Speech Commands tasks among convolution models and linear-time transformers. Moreover, we report improved performance on ImageNet classification by replacing 2D convolutions with 1D $\texttt{MRConv}$ layers.
