Table of Contents
Fetching ...

Bayesian Matrix Decomposition and Applications

Jun Lu

TL;DR

The work surveys Bayesian matrix decomposition (BMD) as a probabilistic framework for factorizing matrices while quantifying uncertainty and incorporating prior knowledge. It builds from foundational linear algebra (four fundamental subspaces and SVD) to Bayesian inference, covering Bayes’ theorem, model evidence, and approximate methods (Laplace, BIC, Occam’s razor) before detailing MC and variational approaches (MCMC, Gibbs, ARS, ELBO, VI, and amortized VI). It culminates with a repertoire of conjugate priors and standard Bayesian linear models to illustrate analytical tractability and model updating. The practical impact lies in providing a self-contained, rigorous primer that enables Bayesian matrix factorizations (e.g., real-valued, NMF, Bayesian interpolative decomposition) and informed model selection for tasks like matrix completion, denoising, and structured inference, while clearly indicating scope limits and directions for deeper Bayesian study.

Abstract

The sole aim of this book is to give a self-contained introduction to concepts and mathematical tools in Bayesian matrix decomposition in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning Bayesian matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of variational inference for conducting the optimization. We refer the reader to literature in the field of Bayesian analysis for a more detailed introduction to the related fields. This book is primarily a summary of purpose, significance of important Bayesian matrix decomposition methods, e.g., real-valued decomposition, nonnegative matrix factorization, Bayesian interpolative decomposition, and the origin and complexity of the methods which shed light on their applications. The mathematical prerequisite is a first course in statistics and linear algebra. Other than this modest background, the development is self-contained, with rigorous proof provided throughout.

Bayesian Matrix Decomposition and Applications

TL;DR

The work surveys Bayesian matrix decomposition (BMD) as a probabilistic framework for factorizing matrices while quantifying uncertainty and incorporating prior knowledge. It builds from foundational linear algebra (four fundamental subspaces and SVD) to Bayesian inference, covering Bayes’ theorem, model evidence, and approximate methods (Laplace, BIC, Occam’s razor) before detailing MC and variational approaches (MCMC, Gibbs, ARS, ELBO, VI, and amortized VI). It culminates with a repertoire of conjugate priors and standard Bayesian linear models to illustrate analytical tractability and model updating. The practical impact lies in providing a self-contained, rigorous primer that enables Bayesian matrix factorizations (e.g., real-valued, NMF, Bayesian interpolative decomposition) and informed model selection for tasks like matrix completion, denoising, and structured inference, while clearly indicating scope limits and directions for deeper Bayesian study.

Abstract

The sole aim of this book is to give a self-contained introduction to concepts and mathematical tools in Bayesian matrix decomposition in order to seamlessly introduce matrix decomposition techniques and their applications in subsequent sections. However, we clearly realize our inability to cover all the useful and interesting results concerning Bayesian matrix decomposition and given the paucity of scope to present this discussion, e.g., the separated analysis of variational inference for conducting the optimization. We refer the reader to literature in the field of Bayesian analysis for a more detailed introduction to the related fields. This book is primarily a summary of purpose, significance of important Bayesian matrix decomposition methods, e.g., real-valued decomposition, nonnegative matrix factorization, Bayesian interpolative decomposition, and the origin and complexity of the methods which shed light on their applications. The mathematical prerequisite is a first course in statistics and linear algebra. Other than this modest background, the development is self-contained, with rigorous proof provided throughout.
Paper Structure (201 sections, 868 equations, 76 figures, 19 tables, 36 algorithms)

This paper contains 201 sections, 868 equations, 76 figures, 19 tables, 36 algorithms.

Figures (76)

  • Figure 1: Two pairs of orthogonal subspaces in $\mathbb{R}^N$ and $\mathbb{R}^M$. $\dim(\mathcal{C}(\bm{A}^\top)) + \dim(\mathcal{N}(\bm{A}))=N$ and $\dim(\mathcal{N}(\bm{A}^\top)) + \dim(\mathcal{C}(\bm{A}))=M$. The null space component maps to zero as $\bm{A}\bm{x}_n = \boldsymbol{0} \in \mathbb{R}^M$. The row space component maps into the column space as $\bm{A}\bm{x}_R = \bm{A}(\bm{x}_R+\bm{x}_n)=\bm{b} \in \mathcal{C}(\bm{A})$.
  • Figure 2: Comparison between the reduced and full SVD. White entries are zero, and blue entries are not necessarily zero
  • Figure 3: Orthonormal bases that diagonalize $\bm{A}$ via the SVD. The set $\{\bm{v}_1, \bm{v}_2, \ldots, \bm{v}_R\}$ forms an orthonormal basis for the row space $\mathcal{C}(\bm{A}^\top)$, and $\{\bm{u}_1,\bm{u}_2, \ldots,\bm{u}_R\}$ forms an orthonormal basis for the column space $\mathcal{C}(\bm{A})$. The action of $\bm{A}$ links these bases: for each $i \in \{1, 2, \ldots, R\}$, it transforms the row-space basis vector $\bm{v}_i$ into the column-space basis vector $\bm{u}_i$ scaled by the singular value $\sigma_i$, i.e., $\bm{A} \bm{v}_i = \sigma_i \bm{u}_i$.
  • Figure 4: Bayesian inference embodies Occam's razor. This figure provides the fundamental intuition for why more complex models tend to be less probable. The horizontal axis represents the space of all possible datasets, $\mathcal{X}$. According to Bayes' theorem, models are favored in proportion to how well they predicted the observed data. These predictions are represented by a marginal probability distribution over $\mathcal{X}$. A simple model makes only a limited range of predictions; while a more powerful model is capable of predicting a greater variety of datasets.
  • Figure 5: Occam factor. The prior distribution $p({\boldsymbol\theta} \mid \mathcal{H}_1)$ for the parameter has width $\sigma_{1}$, and the prior distribution $p({\boldsymbol\theta} \mid \mathcal{H}_2)$ for the parameter has width $\sigma_{2}$ ($\sigma_2<\sigma_1$). The posterior distribution has a single peak at $\widehat{{\boldsymbol\theta}}_{\text{MAP}}$ with width $\widehat{\sigma}_{{\boldsymbol\theta}}$.
  • ...and 71 more figures

Theorems & Definitions (71)

  • Definition 1.1: Matlab Notation
  • Definition 1.2: Eigenvalue, Eigenvector
  • Definition 1.3: Spectrum and Spectral Radius
  • Definition 1.4: Subspace and Span
  • Definition 1.5: Linearly Independent
  • Definition 1.6: Basis and Dimension
  • Definition 1.7: Column Space (Range)
  • Definition 1.8: Null Space (Nullspace, Kernel)
  • Definition 1.9: Rank
  • Definition 1.10: Orthogonal Complement in General
  • ...and 61 more