Hierarchical Zero-Order Optimization for Deep Neural Networks
Sansheng Cao, Zhengyu Ma, Yonghong Tian
TL;DR
This paper tackles the inefficiency of zeroth-order optimization in deep networks by introducing Hierarchical Zeroth-Order (HZO) optimization, which divides the network depth and applies recursive Jacobian-target propagation to deliver updates that are equivalent to Backpropagation (BP) in direction. The key contributions are a proven reduction of query complexity from $O(ML^2)$ to $O(ML \log L)$, a detailed error analysis showing stability near the unitary Lipschitz limit $L_{lip} \approx 1$, and empirical validation on CIFAR-10 and a 10-class ImageNet subset demonstrating competitive accuracy and scalability without full backpropagation. Theoretical results include Theorem 1 (Gradient Equivalence) and a recurrence-based complexity proof, along with an examination of error accumulation (Theorem 3) and the unitary-limit condition. Practically, HZO enables biologically plausible zeroth-order learning to scale to deep architectures, with spatial parallel perturbation further reducing cost for convolutional layers, making non-differentiable or hardware-restricted training more feasible at ImageNet-scale.
Abstract
Zeroth-order (ZO) optimization has long been favored for its biological plausibility and its capacity to handle non-differentiable objectives, yet its computational complexity has historically limited its application in deep neural networks. Challenging the conventional paradigm that gradients propagate layer-by-layer, we propose Hierarchical Zeroth-Order (HZO) optimization, a novel divide-and-conquer strategy that decomposes the depth dimension of the network. We prove that HZO reduces the query complexity from $O(ML^2)$ to $O(ML \log L)$ for a network of width $M$ and depth $L$, representing a significant leap over existing ZO methodologies. Furthermore, we provide a detailed error analysis showing that HZO maintains numerical stability by operating near the unitary limit ($L_{lip} \approx 1$). Extensive evaluations on CIFAR-10 and ImageNet demonstrate that HZO achieves competitive accuracy compared to backpropagation.
