Table of Contents
Fetching ...

Iterative minimization algorithm on a mixture family

Masahito Hayashi

TL;DR

This paper generalizes an algorithm that was recently proposed in the context of the Arimoto–Blahut algorithm, and applies it to the target problem of the em algorithm, and proposes its improvement.

Abstract

Iterative minimization algorithms appear in various areas including machine learning, neural networks, and information theory.The em algorithm is one of the famous iterative minimization algorithms in the area of machine learning, and the Arimoto-Blahut algorithm is a typical iterative algorithm in the area of information theory.However, these two topics had been separately studied for a long time. In this paper, we generalize an algorithm that was recently proposed in the context of the Arimoto-Blahut algorithm.Then, we show various convergence theorems, one of which covers the case when each iterative step is done approximately.Also, we apply this algorithm to the target problem of the em algorithm, and propose its improvement. In addition, we apply it to other various problems in information theory.

Iterative minimization algorithm on a mixture family

TL;DR

This paper generalizes an algorithm that was recently proposed in the context of the Arimoto–Blahut algorithm, and applies it to the target problem of the em algorithm, and proposes its improvement.

Abstract

Iterative minimization algorithms appear in various areas including machine learning, neural networks, and information theory.The em algorithm is one of the famous iterative minimization algorithms in the area of machine learning, and the Arimoto-Blahut algorithm is a typical iterative algorithm in the area of information theory.However, these two topics had been separately studied for a long time. In this paper, we generalize an algorithm that was recently proposed in the context of the Arimoto-Blahut algorithm.Then, we show various convergence theorems, one of which covers the case when each iterative step is done approximately.Also, we apply this algorithm to the target problem of the em algorithm, and propose its improvement. In addition, we apply it to other various problems in information theory.
Paper Structure (26 sections, 11 theorems, 145 equations, 2 figures, 2 tables, 3 algorithms)

This paper contains 26 sections, 11 theorems, 145 equations, 2 figures, 2 tables, 3 algorithms.

Key Result

lemma 1

Under the above definitions, for any positive value $\gamma >0$, we have ${\cal F}_2[Q] =\Gamma^{(e)}_{{\cal M}_a}[{\cal F}_3[Q]]$, i.e.,

Figures (2)

  • Figure 1: Calculation of commitment capacity for the channel given in \ref{['NZU']} with ${\cal X}=\{1,2,3\}$. The right plot shows an enlarged plot of the left plot. The horizontal axis shows the number of iterations. The vertical axis shows the conditional entropy. Red points show the case with $\gamma=1$. Green points show the case with $\gamma=0.95$. Blue points show the case with $\gamma=0.9$. For $t=5,6,\ldots, 10$, these cases have almost the same value. Hence, these plots cannot be distinguished for $t=5,6,7,8,9,10$. At $t=2,3$, the case with $\gamma =1$ is better than other cases. However, in this case, a smaller $\gamma$ does not improve the convergence.
  • Figure 2: Calculation of commitment capacity for the channel given in \ref{['NZU']} with ${\cal X}=\{1,2,3,4\}$. The role of color is the same as Fig. \ref{['con-fig1']}. In this case, a smaller $\gamma$ improves the convergence.

Theorems & Definitions (22)

  • lemma 1
  • proof
  • lemma 2
  • proof
  • remark 1
  • theorem 1
  • proof
  • theorem 2
  • theorem 3
  • corollary 1
  • ...and 12 more