Table of Contents
Fetching ...

Non-Asymptotic Performance of Social Machine Learning Under Limited Data

Ping Hu, Virginia Bordignon, Mert Kayaalp, Ali H. Sayed

TL;DR

This work analyzes the non-asymptotic performance of socially-augmented machine learning for distributed binary classification with streaming unlabeled data. It introduces a two-phase SML framework where each agent independently trains a local classifier to learn a log-likelihood statistic, which is then combined cooperatively over a strongly connected graph using a social learning protocol. The authors establish a probabilistic condition for consistent training and derive finite-sample bounds on the misclassification probability that depend on network structure via the Perron vector, the second eigenvalue of the combination matrix, the network Rademacher complexity, and the chosen decision margin $ abla$. Numerical experiments on FashionMNIST and CIFAR-10 demonstrate exponential decay of error with the number of prediction samples and show the benefit of temporal and spatial information aggregation over traditional boosting methods. The results quantify how training data, loss function properties, and network topology jointly influence finite-sample performance, providing practical sample-complexity guidelines for distributed, data-driven classification tasks.

Abstract

This paper studies the probability of error associated with the social machine learning framework, which involves an independent training phase followed by a cooperative decision-making phase over a graph. This framework addresses the problem of classifying a stream of unlabeled data in a distributed manner. In this work, we examine the classification task with limited observations during the decision-making phase, which requires a non-asymptotic performance analysis. We establish a condition for consistent training and derive an upper bound on the probability of error for classification. The results clarify the dependence on the statistical properties of the data and the combination policy used over the graph. They also establish the exponential decay of the probability of error with respect to the number of unlabeled samples.

Non-Asymptotic Performance of Social Machine Learning Under Limited Data

TL;DR

This work analyzes the non-asymptotic performance of socially-augmented machine learning for distributed binary classification with streaming unlabeled data. It introduces a two-phase SML framework where each agent independently trains a local classifier to learn a log-likelihood statistic, which is then combined cooperatively over a strongly connected graph using a social learning protocol. The authors establish a probabilistic condition for consistent training and derive finite-sample bounds on the misclassification probability that depend on network structure via the Perron vector, the second eigenvalue of the combination matrix, the network Rademacher complexity, and the chosen decision margin . Numerical experiments on FashionMNIST and CIFAR-10 demonstrate exponential decay of error with the number of prediction samples and show the benefit of temporal and spatial information aggregation over traditional boosting methods. The results quantify how training data, loss function properties, and network topology jointly influence finite-sample performance, providing practical sample-complexity guidelines for distributed, data-driven classification tasks.

Abstract

This paper studies the probability of error associated with the social machine learning framework, which involves an independent training phase followed by a cooperative decision-making phase over a graph. This framework addresses the problem of classifying a stream of unlabeled data in a distributed manner. In this work, we examine the classification task with limited observations during the decision-making phase, which requires a non-asymptotic performance analysis. We establish a condition for consistent training and derive an upper bound on the probability of error for classification. The results clarify the dependence on the statistical properties of the data and the combination policy used over the graph. They also establish the exponential decay of the probability of error with respect to the number of unlabeled samples.
Paper Structure (12 sections, 8 theorems, 91 equations, 4 figures)

This paper contains 12 sections, 8 theorems, 91 equations, 4 figures.

Key Result

Theorem 1

Assume $\rho<\mathdutchcal{E}_\Phi(\mathsf{R}^o,0)$, where $\mathdutchcal{E}_\Phi(\mathsf{R}^o,0)$ is defined by eq: d_0^star in appendix: Lemma. Under Assumptions assump: risk function--assump: network and condition eq: R^o condition, the probability of consistent training $P_c$ in eq: probability where and $\beta$ is the bound on the function $f_k$ specified in Assumption assump: bound.

Figures (4)

  • Figure 1: SML architecture. (Left panel) The independent training process where each agent $k$ finds an optimal model $\widetilde{\bm{f}}_k$ based on its training set and constructs a classifier $\widetilde{\bm{c}}_k$ involving a debiasing operation. (Right panel) The cooperative classification process where each agent $k$ receives a sequence of streaming observations $\bm{h}_{k,i}$ and implements a social learning protocol to enhance the prediction performance. The neighboring set $\mathcal{N}_k$ of agent $k$ is marked by the area highlighted in gray.
  • Figure 2: (a) Topology of the communication network involving $9$ agents. (b) Observation map for the $9$ agents in the binary classification tasks constructed from the FashionMNIST dataset.
  • Figure 3: (FashionMNIST) (a) Decision margin under different training set sizes $N_0$. (b) Evolution of the probability of error within the SML strategy (solid lines with circles) and AdaBoost (dashed lines) over the prediction samples $i$ for different $N_0$.
  • Figure 4: (CIFAR10) (a) Decision margin under different training set sizes $N_0$. (b)Evolution of the probability of error within the SML strategy (solid lines with circles) and AdaBoost (dashed lines) over the prediction samples $i$ for different $N_0$.

Theorems & Definitions (13)

  • Theorem 1: Probability of consistent training
  • proof
  • Theorem 2: Probability of $\delta$-margin consistent training
  • proof
  • Corollary 1: Training sample complexity
  • proof
  • Theorem 3: Statistical classification error
  • proof
  • Corollary 2: Testing sample complexity
  • proof
  • ...and 3 more