Table of Contents
Fetching ...

A naive aggregation algorithm for improving generalization in a class of learning problems

Getachew K Befekadu

Abstract

In this brief paper, we present a naive aggregation algorithm for a typical learning problem with expert advice setting, in which the task of improving generalization, i.e., model validation, is embedded in the learning process as a sequential decision-making problem. In particular, we consider a class of learning problem of point estimations for modeling high-dimensional nonlinear functions, where a group of experts update their parameter estimates using the discrete-time version of gradient systems, with small additive noise term, guided by the corresponding subsample datasets obtained from the original dataset. Here, our main objective is to provide conditions under which such an algorithm will sequentially determine a set of mixing distribution strategies used for aggregating the experts' estimates that ultimately leading to an optimal parameter estimate, i.e., as a consensus solution for all experts, which is better than any individual expert's estimate in terms of improved generalization or learning performances. Finally, as part of this work, we present some numerical results for a typical case of nonlinear regression problem.

A naive aggregation algorithm for improving generalization in a class of learning problems

Abstract

In this brief paper, we present a naive aggregation algorithm for a typical learning problem with expert advice setting, in which the task of improving generalization, i.e., model validation, is embedded in the learning process as a sequential decision-making problem. In particular, we consider a class of learning problem of point estimations for modeling high-dimensional nonlinear functions, where a group of experts update their parameter estimates using the discrete-time version of gradient systems, with small additive noise term, guided by the corresponding subsample datasets obtained from the original dataset. Here, our main objective is to provide conditions under which such an algorithm will sequentially determine a set of mixing distribution strategies used for aggregating the experts' estimates that ultimately leading to an optimal parameter estimate, i.e., as a consensus solution for all experts, which is better than any individual expert's estimate in terms of improved generalization or learning performances. Finally, as part of this work, we present some numerical results for a typical case of nonlinear regression problem.
Paper Structure (4 sections, 1 theorem, 20 equations, 1 figure)

This paper contains 4 sections, 1 theorem, 20 equations, 1 figure.

Key Result

Proposition 2.1

\newlabelP1 Let $L_n = \sum\nolimits_{k=1}^K \pi_n(k) r_n(k)$, $n = 0,1, 2, \ldots, N-1$, be a sequence of losses associated with the mixing distribution strategies $\pi_n =\left(\pi_n(1), \pi_n(2), \ldots, \pi_n(K)\right)$ and risk measures $r_n(k)$, for $k=1,2,\ldots, K$. Then, the total overall

Figures (1)

  • Figure 3.1: Plots for the original dataset and the population growth model.

Theorems & Definitions (4)

  • Remark 1
  • Remark 2
  • Proposition 2.1
  • proof