Non-Asymptotic Performance of Social Machine Learning Under Limited Data
Ping Hu, Virginia Bordignon, Mert Kayaalp, Ali H. Sayed
TL;DR
This work analyzes the non-asymptotic performance of socially-augmented machine learning for distributed binary classification with streaming unlabeled data. It introduces a two-phase SML framework where each agent independently trains a local classifier to learn a log-likelihood statistic, which is then combined cooperatively over a strongly connected graph using a social learning protocol. The authors establish a probabilistic condition for consistent training and derive finite-sample bounds on the misclassification probability that depend on network structure via the Perron vector, the second eigenvalue of the combination matrix, the network Rademacher complexity, and the chosen decision margin $ abla$. Numerical experiments on FashionMNIST and CIFAR-10 demonstrate exponential decay of error with the number of prediction samples and show the benefit of temporal and spatial information aggregation over traditional boosting methods. The results quantify how training data, loss function properties, and network topology jointly influence finite-sample performance, providing practical sample-complexity guidelines for distributed, data-driven classification tasks.
Abstract
This paper studies the probability of error associated with the social machine learning framework, which involves an independent training phase followed by a cooperative decision-making phase over a graph. This framework addresses the problem of classifying a stream of unlabeled data in a distributed manner. In this work, we examine the classification task with limited observations during the decision-making phase, which requires a non-asymptotic performance analysis. We establish a condition for consistent training and derive an upper bound on the probability of error for classification. The results clarify the dependence on the statistical properties of the data and the combination policy used over the graph. They also establish the exponential decay of the probability of error with respect to the number of unlabeled samples.
