LPZero: Language Model Zero-cost Proxy Search from Zero
Peijie Dong, Lujun Li, Xiang Liu, Zhenheng Tang, Xuebo Liu, Qiang Wang, Xiaowen Chu
TL;DR
The paper tackles the high computational cost of Neural Architecture Search by introducing LPZero, a framework that automatically designs zero-cost proxies for language models. Proxies are modeled as symbolic expressions within a unified search space, and genetic programming plus a Rule-based Pruning Strategy are used to maximize ranking fidelity to ground-truth performance. LPZero demonstrates superior ranking correlations on FlexiBERT and GPT-2 benchmarks and yields competitive, cost-efficient sub-networks for LLaMA when integrated with LoNAS. This approach offers a practical, training-free estimator for guiding NAS in large NLP models, reducing compute while preserving ranking quality and downstream performance.
Abstract
In spite of the outstanding performance, Neural Architecture Search (NAS) is criticized for massive computation. Recently, Zero-shot NAS has emerged as a promising approach by exploiting Zero-cost (ZC) proxies, which markedly reduce computational demands. Despite this, existing ZC proxies heavily rely on expert knowledge and incur significant trial-and-error costs. Particularly in NLP tasks, most existing ZC proxies fail to surpass the performance of the naive baseline. To address these challenges, we introduce a novel framework, \textbf{LPZero}, which is the first to automatically design ZC proxies for various tasks, achieving higher ranking consistency than human-designed proxies. Specifically, we model the ZC proxy as a symbolic equation and incorporate a unified proxy search space that encompasses existing ZC proxies, which are composed of a predefined set of mathematical symbols. To heuristically search for the best ZC proxy, LPZero incorporates genetic programming to find the optimal symbolic composition. We propose a \textit{Rule-based Pruning Strategy (RPS),} which preemptively eliminates unpromising proxies, thereby mitigating the risk of proxy degradation. Extensive experiments on FlexiBERT, GPT-2, and LLaMA-7B demonstrate LPZero's superior ranking ability and performance on downstream tasks compared to current approaches.
