Table of Contents
Fetching ...

Bayesian Hierarchical Models and the Maximum Entropy Principle

Brendon J. Brewer

TL;DR

It is demonstrated that, when the prior given the hyperparameters is a canonical distribution (a maximum entropy distribution with moment constraints), the dependent marginal prior also has a maximum entropy property, with a different constraint.

Abstract

Bayesian hierarchical models are frequently used in practical data analysis contexts. One interpretation of these models is that they provide an indirect way of assigning a prior for unknown parameters, through the introduction of hyperparameters. The resulting marginal prior for the parameters (integrating over the hyperparameters) is usually dependent, so that learning one parameter provides some information about the others. In this contribution, I will demonstrate that, when the prior given the hyperparameters is a canonical distribution (a maximum entropy distribution with moment constraints), the dependent marginal prior also has a maximum entropy property, with a different constraint. This constraint is on the marginal distribution of some function of the unknown quantities. The results shed light on what information is actually being assumed when we assign a hierarchical model.

Bayesian Hierarchical Models and the Maximum Entropy Principle

TL;DR

It is demonstrated that, when the prior given the hyperparameters is a canonical distribution (a maximum entropy distribution with moment constraints), the dependent marginal prior also has a maximum entropy property, with a different constraint.

Abstract

Bayesian hierarchical models are frequently used in practical data analysis contexts. One interpretation of these models is that they provide an indirect way of assigning a prior for unknown parameters, through the introduction of hyperparameters. The resulting marginal prior for the parameters (integrating over the hyperparameters) is usually dependent, so that learning one parameter provides some information about the others. In this contribution, I will demonstrate that, when the prior given the hyperparameters is a canonical distribution (a maximum entropy distribution with moment constraints), the dependent marginal prior also has a maximum entropy property, with a different constraint. This constraint is on the marginal distribution of some function of the unknown quantities. The results shed light on what information is actually being assumed when we assign a hierarchical model.
Paper Structure (6 sections, 22 equations, 2 figures)

This paper contains 6 sections, 22 equations, 2 figures.

Figures (2)

  • Figure 1: The orange distribution is the implied prior for the log of the arithmetic mean of 100 positive quantities, using a Uniform$(0, 100)$ distribution. The blue distribution is the implied prior using a hierarchical model with $\log\mu \sim \textnormal{Uniform}(-5, 5)$, expressing more appropriate prior uncertainty about the arithmetic mean.
  • Figure 2: The orange distribution is the implied prior for the sum and sum of squares of 100 quantities with independent Uniform$(-100, 100)$ priors. The blue distribution is the implied prior using a hierarchical model, expressing more appropriate prior uncertainty (approximately uniform in the horizontal direction and log-uniform vertically) about the sum and the sum of squares. The U-shape arises from the prior bounds for the hyperparameters. For completeness, these priors were $\mu \sim \textnormal{Uniform}(-100, 100)$ and $\ln\sigma \sim \textnormal{Uniform}(-5, 5)$ independently.