Table of Contents
Fetching ...

Tree-Embedded Bayesian Factor Models for Multidimensional Categorical Distributions

Naoki Awaya, Keisuke Sasaki, Genya Kobayashi, Shonosuke Sugasawa

TL;DR

A new Bayesian latent factor model for distributions is proposed, providing a parsimonious model for describing many observed distributions through lower-dimensional structures, and outperforms the standard Dirichlet mixture model as well as models built on parametric assumptions.

Abstract

Analyzing data collected from multiple sources to estimate common and heterogeneous structures through a hierarchical model is a central task in Bayesian inference, and to this end, Bayesian factor models are one of the most widely used tools for this purpose. In this paper, we propose a new Bayesian latent factor model for distributions, providing a parsimonious model for describing many observed distributions through lower-dimensional structures. Many applications are found in the social science in the form of grouped data, for example, distributions of age composition and income observed across locations. In these contexts, standard mixture models can be inefficient because the distributions do not necessarily exhibit clear clustering structures. To overcome the difficulty, we introduce a tree-based transformation that embeds distributions into a Euclidean space and construct a Bayesian latent factor model in the transformed space. Once transformed into Euclidean vectors, the Bayesian hierarchical model can be extended in a straightforward manner. As an illustration, we incorporate spatial dependence by introducing a prior based on a simultaneous autoregressive (SAR) model. The proposed model is "nonparametric" in the sense that it does not impose parametric assumptions on the form of the observed distributions. Through numerical experiments using real population data, we demonstrate that the proposed model outperforms the standard Dirichlet mixture model as well as models built on parametric assumptions.

Tree-Embedded Bayesian Factor Models for Multidimensional Categorical Distributions

TL;DR

A new Bayesian latent factor model for distributions is proposed, providing a parsimonious model for describing many observed distributions through lower-dimensional structures, and outperforms the standard Dirichlet mixture model as well as models built on parametric assumptions.

Abstract

Analyzing data collected from multiple sources to estimate common and heterogeneous structures through a hierarchical model is a central task in Bayesian inference, and to this end, Bayesian factor models are one of the most widely used tools for this purpose. In this paper, we propose a new Bayesian latent factor model for distributions, providing a parsimonious model for describing many observed distributions through lower-dimensional structures. Many applications are found in the social science in the form of grouped data, for example, distributions of age composition and income observed across locations. In these contexts, standard mixture models can be inefficient because the distributions do not necessarily exhibit clear clustering structures. To overcome the difficulty, we introduce a tree-based transformation that embeds distributions into a Euclidean space and construct a Bayesian latent factor model in the transformed space. Once transformed into Euclidean vectors, the Bayesian hierarchical model can be extended in a straightforward manner. As an illustration, we incorporate spatial dependence by introducing a prior based on a simultaneous autoregressive (SAR) model. The proposed model is "nonparametric" in the sense that it does not impose parametric assumptions on the form of the observed distributions. Through numerical experiments using real population data, we demonstrate that the proposed model outperforms the standard Dirichlet mixture model as well as models built on parametric assumptions.
Paper Structure (19 sections, 29 equations, 6 figures, 2 tables)

This paper contains 19 sections, 29 equations, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Illustration of a two-layer tree with three internal-node parameters.
  • Figure 2: Illustration of simulated distributions generated from a two-component mixture model (left) and the proposed factor model (right). Red triangles indicate the component distributions.
  • Figure 3: Visualization of the tree partition obtained using the MV algorithm for the population data observed at 19:00 on 12/24/2019.
  • Figure 4: Left: Map of the districts where the spatiotemporal population data were collected. The four locations that are discussed in the application results are indicated. Right: Observed distributions in six randomly selected locations at 19:00 on 12/24/2019.
  • Figure 5: Decomposition of PPL components for the Log-normal PWD model by gender. Panels (a) and (b) display the results for female and male data, respectively. In each panel, the left chart shows the average predictive bias, and the right chart in each panel displays the average predictive variance.
  • ...and 1 more figures