Table of Contents
Fetching ...

Aspects of importance sampling in parameter selection for neural networks using ridgelet transform

Hikaru Homma, Jun Ohkubo

TL;DR

The aspect of importance sampling is demonstrated and the proposed sampling algorithms via one-dimensional and high-dimensional examples; the results imply that the magnitude of weight parameters could be more crucial than the intercept parameters.

Abstract

The choice of parameters in neural networks is crucial in the performance, and an oracle distribution derived from the ridgelet transform enables us to obtain suitable initial parameters. In other words, the distribution of parameters is connected to the integral representation of target functions. The oracle distribution allows us to avoid the conventional backpropagation learning process; only a linear regression is enough to construct the neural network in simple cases. This study provides a new look at the oracle distributions and ridgelet transforms, i.e., an aspect of importance sampling. In addition, we propose extensions of the parameter sampling methods. We demonstrate the aspect of importance sampling and the proposed sampling algorithms via one-dimensional and high-dimensional examples; the results imply that the magnitude of weight parameters could be more crucial than the intercept parameters.

Aspects of importance sampling in parameter selection for neural networks using ridgelet transform

TL;DR

The aspect of importance sampling is demonstrated and the proposed sampling algorithms via one-dimensional and high-dimensional examples; the results imply that the magnitude of weight parameters could be more crucial than the intercept parameters.

Abstract

The choice of parameters in neural networks is crucial in the performance, and an oracle distribution derived from the ridgelet transform enables us to obtain suitable initial parameters. In other words, the distribution of parameters is connected to the integral representation of target functions. The oracle distribution allows us to avoid the conventional backpropagation learning process; only a linear regression is enough to construct the neural network in simple cases. This study provides a new look at the oracle distributions and ridgelet transforms, i.e., an aspect of importance sampling. In addition, we propose extensions of the parameter sampling methods. We demonstrate the aspect of importance sampling and the proposed sampling algorithms via one-dimensional and high-dimensional examples; the results imply that the magnitude of weight parameters could be more crucial than the intercept parameters.
Paper Structure (16 sections, 18 equations, 5 figures, 3 algorithms)

This paper contains 16 sections, 18 equations, 5 figures, 3 algorithms.

Figures (5)

  • Figure 1: (Color online) (a) The topologist's sine curve (TSC). (b) $f(\bm{x})$ obtained by Eq. \ref{['eq_basis_reconstruction_integral']}. (c) $g_J(\bm{x})$ obtained by Algorithm \ref{['alg1']} and the ridge regression. The solid line corresponds to the original curve, which has numerical instability around $x=0$. The dotted curves in (b) and (c) correspond to the approximated ones.
  • Figure 2: (Color online) Function shapes obtained by Eqs. \ref{['eq_IS_1']}, \ref{['eq_IS_2']}, and \ref{['eq_IS_3']}. (a) and (b) correspond to $\sin (x)$ and $f(\bm{x})$, respectively. The solid and dotted curves correspond to the original functions and the obtained ones, respectively.
  • Figure 3: (Color online) (a) $f(\bm{x})$ obtained by Algorithm \ref{['alg2']} and the ridge regression. (b) $f(\bm{x})$ obtained by Algorithm \ref{['alg3']} and the ridge regression. The solid and dotted curves correspond to the original functions and the obtained ones, respectively.
  • Figure 4: (Color online) The training errors in the learning processes for the TSC function. The vertical axis shows root mean squared errors, and the horizontal axis indicates the iteration steps. The error regions are drawn based on the standard deviations.
  • Figure 5: (Color online) The classification error rates in the learning processes for the MNIST test dataset. The vertical axis shows root mean squared errors, and the horizontal axis indicates the iteration steps. The error regions are drawn based on the standard deviations.