Fast networked data selection via distributed smoothed quantile estimation
Xu Zhang, Marcos M. Vasconcelos
TL;DR
This work tackles fast, scalable distributed top-$k$ data selection in networked systems by reframing the problem as distributed quantile estimation of local informativeness scores. To overcome slow convergence caused by non-smoothness and lack of strong convexity, it introduces smoothing techniques (Nesterov and convolution) for the local objectives and integrates them with the EXTRA distributed optimization algorithm. The resulting method achieves a memory- and communication-efficient implementation where each agent stores two scalars and transmits a single value per iteration, with iteration complexity that depends on the smoothing parameter $h$, the quantization gap $ riangle$, the multiplicity $g_m$ of the $k$-th score, and network connectivity through $ ext{gap}(oldsymbol{W})$. Numerical results show substantial speedups over non-smoothed distributed methods and favorable scalability to large networks, while preserving data privacy by keeping data local and sharing only threshold estimates. Overall, the approach offers a practical, robust pathway for distributed top-$k$ selection in sensor networks, federated settings, and other distributed data systems.
Abstract
Collecting the most informative data from a large dataset distributed over a network is a fundamental problem in many fields, including control, signal processing and machine learning. In this paper, we establish a connection between selecting the most informative data and finding the top-$k$ elements of a multiset. The top-$k$ selection in a network can be formulated as a distributed nonsmooth convex optimization problem known as quantile estimation. Unfortunately, the lack of smoothness in the local objective functions leads to extremely slow convergence and poor scalability with respect to the network size. To overcome the deficiency, we propose an accelerated method that employs smoothing techniques. Leveraging the piecewise linearity of the local objective functions in quantile estimation, we characterize the iteration complexity required to achieve top-$k$ selection, a challenging task due to the lack of strong convexity. Several numerical results are provided to validate the effectiveness of the algorithm and the correctness of the theory.
