LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking
Jialin Li, Qiang Nie, Weifu Fu, Yuhuan Lin, Guangpin Tao, Yong Liu, Chengjie Wang
TL;DR
LORS introduces a parameter-efficient approach for stacked neural modules by decomposing weights into shared components plus low-rank residuals across layers, with static (LORST) and adaptive (LORSA) variants. By applying LORS to AdaMixer's decoders, the method achieves up to ~70% decoder parameter reduction while maintaining, and in some cases improving, object detection performance on MS COCO. The approach leverages cross-layer sharing and low-rank private contributions to capture layer-specific nuances, effectively regularizing the stacked structure. The technique is broadly applicable to transformer-like stacks and offers a practical pathway to deploy large, depth-heavy models with reduced parameter footprints and maintained performance.
Abstract
Deep learning models, particularly those based on transformers, often employ numerous stacked structures, which possess identical architectures and perform similar functions. While effective, this stacking paradigm leads to a substantial increase in the number of parameters, posing challenges for practical applications. In today's landscape of increasingly large models, stacking depth can even reach dozens, further exacerbating this issue. To mitigate this problem, we introduce LORS (LOw-rank Residual Structure). LORS allows stacked modules to share the majority of parameters, requiring a much smaller number of unique ones per module to match or even surpass the performance of using entirely distinct ones, thereby significantly reducing parameter usage. We validate our method by applying it to the stacked decoders of a query-based object detector, and conduct extensive experiments on the widely used MS COCO dataset. Experimental results demonstrate the effectiveness of our method, as even with a 70\% reduction in the parameters of the decoder, our method still enables the model to achieve comparable or
