Rethinking Parameter Sharing as Graph Coloring for Structured Compression
Boyang Zhang, Daning Cheng, Yunquan Zhang
TL;DR
The paper addresses the memory bottleneck of large neural models and the limitations of heuristic, adjacent-layer parameter sharing. It introduces Geo-Sharing, a symmetry- and graph-coloring-based framework that represents cross-layer sharing via a coloring function $\alpha: L \rightarrow C$ and selects sharing groups using a second-order geometric criterion that aligns perturbations with the Hessian's low-curvature subspace. By decomposing weights with shared bases $B_b=(U_b,V_b)$ and solving a curvature-aligned optimization under a trust-region constraint, the method yields scalable, training-free sharing configurations. Across vision and language benchmarks, Geo-Sharing achieves superior compression–accuracy trade-offs, with substantial inference-time efficiency gains and strong scalability to very large models.
Abstract
Modern deep models have massive parameter sizes, leading to high inference-time memory usage that limits practical deployment. Parameter sharing, a form of structured compression, effectively reduces redundancy, but existing approaches remain heuristic-restricted to adjacent layers and lacking a systematic analysis for cross-layer sharing. However, extending sharing across multiple layers leads to an exponentially expanding configuration space, making exhaustive search computationally infeasible and forming a critical bottleneck for parameter sharing. We recast parameter sharing from a group-theoretic perspective as introducing structural symmetries in the model's parameter space. A sharing configuration can be described by a coloring function $α:L\rightarrow C$ (L: layer indices and C: sharing classes), which determines inter-layer sharing groups while preserving structural symmetry. To determine the coloring function, we propose a second-order geometric criterion based on Taylor expansion and the Hessian spectrum. By projecting perturbations onto the Hessian's low-curvature eigensubspace, the criterion provides an analytic rule for selecting sharing groups that minimize performance impact, yielding a principled and scalable configuration procedure. Across diverse architectures and tasks, Geo-Sharing consistently outperforms state-of-the-art heuristic sharing strategies, achieving higher compression ratios with smaller accuracy degradation.
