Table of Contents
Fetching ...

Optimizing Decentralized Online Learning for Supervised Regression and Classification Problems

J. M. Diederik Kruijssen, Renata Valieva, Steven N. Longmore

TL;DR

The paper presents a systematic calibration framework for key parameters in decentralized online learning, focusing on how historical performance maps to weights and how performance translates to rewards. Using an Allora-like simulator, it separates the optimization into the slope of the regret-to-weight mapping ($p$), the historical window via EMA ($\alpha$), and the slope of reward mappings ($p_i,p_f,p_r$), comparing regression and classification tasks. It shows that optimal $p$ is 3 for regression and 5 for classification, that $\alpha \\approx 0.1$ balances memory and adaptability, and that $p_i= p_f= p_r=1$–3-1–3 defaults minimize reward-spread while remaining effective across network compositions. The findings offer a practical recipe for tuning decentralized inference systems and suggest that these defaults generalize to networks solving a range of inference-synthesis problems beyond the Allora design.

Abstract

Decentralized learning networks aim to synthesize a single network inference from a set of raw inferences provided by multiple participants. To determine the combined inference, these networks must adopt a mapping from historical participant performance to weights, and to appropriately incentivize contributions they must adopt a mapping from performance to fair rewards. Despite the increased prevalence of decentralized learning networks, there exists no systematic study that performs a calibration of the associated free parameters. Here we present an optimization framework for key parameters governing decentralized online learning in supervised regression and classification problems. These parameters include the slope of the mapping between historical performance and participant weight, the timeframe for performance evaluation, and the slope of the mapping between performance and rewards. These parameters are optimized using a suite of numerical experiments that mimic the design of the Allora Network, but have been extended to handle classification tasks in addition to regression tasks. This setup enables a comparative analysis of parameter tuning and network performance optimization (loss minimization) across both problem types. We demonstrate how the optimal performance-weight mapping, performance timeframe, and performance-reward mapping vary with network composition and problem type. Our findings provide valuable insights for the optimization of decentralized learning protocols, and we discuss how these results can be generalized to optimize any inference synthesis-based, decentralized AI network.

Optimizing Decentralized Online Learning for Supervised Regression and Classification Problems

TL;DR

The paper presents a systematic calibration framework for key parameters in decentralized online learning, focusing on how historical performance maps to weights and how performance translates to rewards. Using an Allora-like simulator, it separates the optimization into the slope of the regret-to-weight mapping (), the historical window via EMA (), and the slope of reward mappings (), comparing regression and classification tasks. It shows that optimal is 3 for regression and 5 for classification, that balances memory and adaptability, and that –3-1–3 defaults minimize reward-spread while remaining effective across network compositions. The findings offer a practical recipe for tuning decentralized inference systems and suggest that these defaults generalize to networks solving a range of inference-synthesis problems beyond the Allora design.

Abstract

Decentralized learning networks aim to synthesize a single network inference from a set of raw inferences provided by multiple participants. To determine the combined inference, these networks must adopt a mapping from historical participant performance to weights, and to appropriately incentivize contributions they must adopt a mapping from performance to fair rewards. Despite the increased prevalence of decentralized learning networks, there exists no systematic study that performs a calibration of the associated free parameters. Here we present an optimization framework for key parameters governing decentralized online learning in supervised regression and classification problems. These parameters include the slope of the mapping between historical performance and participant weight, the timeframe for performance evaluation, and the slope of the mapping between performance and rewards. These parameters are optimized using a suite of numerical experiments that mimic the design of the Allora Network, but have been extended to handle classification tasks in addition to regression tasks. This setup enables a comparative analysis of parameter tuning and network performance optimization (loss minimization) across both problem types. We demonstrate how the optimal performance-weight mapping, performance timeframe, and performance-reward mapping vary with network composition and problem type. Our findings provide valuable insights for the optimization of decentralized learning protocols, and we discuss how these results can be generalized to optimize any inference synthesis-based, decentralized AI network.

Paper Structure

This paper contains 13 sections, 37 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Probability distributions generated from the Dirichlet distribution ${\cal D}$ for a set of three labels $N_{ic}=3$ (where $\sum_c P_{ic}=1$ implies that the parameter space is a two-dimensional triangle), using different concentration parameter vectors $\vec{\alpha}$. A comparison between the two top panels shows that the normalization of $\vec{\alpha}$ controls the overall concentration of probabilities, with higher $\lVert\vec{\alpha}\rVert/\sqrt{N_{ic}}$ resulting in a higher degree of concentration. The two bottom panels illustrate how the probabilities become unbalanced if one (bottom-left) or all three (bottom-right) concentration parameters differ.
  • Figure 2: Variation in network loss as a function of the regret-to-weight mapping slope $p$ (left) and the EMA parameter $\log_{10}(\alpha)$ (right), for regression (top) and classification (bottom). The box plots show the network loss over 1000 epochs and across all network compositions, with a red line showing the median and whiskers extending to the 10th and 90th percentiles. Beyond these percentiles, outliers are shown as transparent individual points. Based on these experiments, we select default parameters $p=3$ for regression, $p=5$ for classification, and $\log_{10}(\alpha)=-1.0$ (see the text for discussion).
  • Figure 3: Variation in the instantaneous standard deviation of the mean reward received across each class of activity at epoch $i$, as a function of the score-to-reward mapping slope $p_{\rm i}$ (left), $p_{\rm f}$ (middle), and $p_{\rm r}$ (right), for regression (top) and classification (bottom). The box plots show the reward spread over 1000 epochs and across all network compositions, with a red line showing the median and whiskers extending to the 10th and 90th percentiles. Beyond these percentiles, outliers are shown as transparent individual points. For comparison, the mean reward received by a participant in each class is shown as a colored line, with the color reflecting the activity type as indicated by the legend. Based on these experiments, we select the default parameters $p_{\rm i}=3$, $p_{\rm f}=3$, and $p_{\rm r}=1$ (see the text for a discussion).
  • Figure 4: Variation in median network loss for different network compositions, as a function of the regret-to-weight mapping slope $p$ (left) and the EMA parameter $\log_{10}(\alpha)$ (right), for regression (top) and classification (bottom). The median is taken over 1000 epochs, with each line representing a different network composition as indicated by the legend. For each network composition, an open circle marks the $x$-value of the minimum network loss for that composition. The vertical dashed lines show the default values, i.e. $p=3$ for regression, $p=5$ for classification, and $\log_{10}(\alpha)=-1.0$.
  • Figure 5: Variation of the regret-to-weight mapping slope $p$ that minimizes the network loss with network composition, represented as a function of the number of inferers $N_{\rm i}$ and forecasters $N_{\rm f}$ for regression (left) and classification (right). Each value is obtained by taking the median over 10 experiments with different random seeds. There might be a hint of steeper regret-to-weight mappings for inferer-heavy network compositions (when $N_{\rm i}>N_{\rm f}$, we have mean $\overline{p}=2.81\pm0.03$ for regression and $\overline{p}=4.85\pm0.05$ for classification) than for forecaster-heavy network compositions (when $N_{\rm f}>N_{\rm i}$, we have mean $\overline{p}=2.79\pm0.04$ for regression and $\overline{p}=4.70\pm0.08$ for classification). The absence of any major trends between $p$ and network composition implies that the optimal regret-to-weight mapping slope $p$ can be set to the default values of $p=3$ (regression) or $p=5$ (classification), independently of the network composition.
  • ...and 1 more figures