Table of Contents
Fetching ...

Robust Model Selection for Discovery of Latent Mechanistic Processes

Jiawei Li, Nguyen Nguyen, Meng Lai, Ioannis Ch. Paschalidis, Jonathan H. Huggins

TL;DR

This work proposes the accumulated cutoff discrepancy criterion (ACDC), which leverages plug-in estimates of component-level discrepancies for a general class of latent variable models that includes unsupervised and supervised variants of probabilistic matrix factorization and mixture modeling.

Abstract

When learning interpretable latent structures using model-based approaches, even small deviations from modeling assumptions can lead to inferential results that are not mechanistically meaningful. In this work, we consider latent structures that consist of $K_o$ mechanistic processes, where $K_o$ is unknown. When the model is misspecified, likelihood-based model selection methods can substantially overestimate $K_o$ while more robust nonparametric methods can be overly conservative. Hence, there is a need for approaches that combine the sensitivity of likelihood-based methods with the robustness of nonparametric ones. We formalize this objective in terms of a robust model selection consistency property, which is based on a component-level discrepancy measure that captures the mechanistic structure of the model. We then propose the accumulated cutoff discrepancy criterion (ACDC), which leverages plug-in estimates of component-level discrepancies. To apply ACDC, we develop mechanistically meaningful component-level discrepancies for a general class of latent variable models that includes unsupervised and supervised variants of probabilistic matrix factorization and mixture modeling. We show that ACDC is robustly consistent when applied to unsupervised matrix factorization and mixture models. Numerical results demonstrate that in practice our approach reliably identifies a mechanistically meaningful number of latent processes in numerous illustrative applications, outperforming existing methods.

Robust Model Selection for Discovery of Latent Mechanistic Processes

TL;DR

This work proposes the accumulated cutoff discrepancy criterion (ACDC), which leverages plug-in estimates of component-level discrepancies for a general class of latent variable models that includes unsupervised and supervised variants of probabilistic matrix factorization and mixture modeling.

Abstract

When learning interpretable latent structures using model-based approaches, even small deviations from modeling assumptions can lead to inferential results that are not mechanistically meaningful. In this work, we consider latent structures that consist of mechanistic processes, where is unknown. When the model is misspecified, likelihood-based model selection methods can substantially overestimate while more robust nonparametric methods can be overly conservative. Hence, there is a need for approaches that combine the sensitivity of likelihood-based methods with the robustness of nonparametric ones. We formalize this objective in terms of a robust model selection consistency property, which is based on a component-level discrepancy measure that captures the mechanistic structure of the model. We then propose the accumulated cutoff discrepancy criterion (ACDC), which leverages plug-in estimates of component-level discrepancies. To apply ACDC, we develop mechanistically meaningful component-level discrepancies for a general class of latent variable models that includes unsupervised and supervised variants of probabilistic matrix factorization and mixture modeling. We show that ACDC is robustly consistent when applied to unsupervised matrix factorization and mixture models. Numerical results demonstrate that in practice our approach reliably identifies a mechanistically meaningful number of latent processes in numerous illustrative applications, outperforming existing methods.
Paper Structure (69 sections, 9 theorems, 87 equations, 24 figures, 4 tables)

This paper contains 69 sections, 9 theorems, 87 equations, 24 figures, 4 tables.

Key Result

Theorem 1

ACDC using $\widehat{\mathcal{D}}_{\mathrm{comp}}^{(K,k)}$ defined in eq:general-Dcomp is $\kappa$-robustly consistent for mixture models if the underlying component discrepancy $\mathcal{D}$ is the KL divergence, Wasserstein distance, or maximum mean discrepancy (MMD). For the KL divergence, $\kapp

Figures (24)

  • Figure 1: Cartoon illustration of the differences between traditional and robust model selection consistency where the true model corresponds to $K_{o} = 4$, with the nested models indicated by the gray ovals. We contrast five possible data-generating distributions, $P_{o}^{A}, \dots, P_{o}^{E}$, indicated by the gold points. Since $P_{o}^{A}, P_{o}^{B} \in \mathcal{M}^{(K_{o})}$ but are not in $\mathcal{M}^{(K)}$ for $K < K_o = 4$, for these models $K_{o}$ can be consistently estimated. However, since $P_{o}^{C}, P_{o}^{D}, P_{o}^{E} \notin \mathcal{M}^{(K_{o})}$, for these distributions $K_{o}$ cannot be estimated consistently in the traditional sense. On the other hand $P_{o}^{A}, P_{o}^{B}, P_{o}^{C}$, and $P_{o}^{D}$ are close to $\mathcal{M}^{(K_{o})}$, so $K_{o}$ could potentially be robustly and consistently estimated in these four cases. However, $P_{o}^{B}$ and $P_{o}^{D}$ are also close to $\mathcal{M}^{(3)}$, so robustly consistent estimation of $K_{o}$ is feasible only for $P_{o}^{A}$ and $P_{o}^{C}$. Since $P_{o}^{E}$ is far from $\mathcal{M}^{(K_{o})}$, $K_{o}$ would not be consistently estimable -- either in the traditional or robust sense.
  • Figure 2: Graphical representation of the general form for model $\mathcal{M}^{(K)}$ for a single observation $x_{n}$. Circles denote random variables while squares denote deterministic variables. A gray background indicates global parameters while a black background indicates an observed quantity.
  • Figure 3: Application of ACDC clustering simulated high dimensional data (\ref{['sec:high-dim-simulation']}). Left: The penalized loss plot for $K \in \{1,2,3,4\}$. Since the loss for $K=3$ is very close to equal to or much less than the loss for $K \ne 3$, it shows the greatest stability, resulting in $\widehat{K} = 3$ (indicated by the "$X$"). Right: Selected two-dimensional projections of the data.
  • Figure 4: Comparison of ACDC with common model selection criteria (Elbow, Gap, and Silhouette) in low-and high-dimensional settings. Each point shows the deviation between estimated and true cluster counts ($\hat{K}-K_o$) of one dataset. Dotted black lines indicate the median error for each method. Mean absolute error (MAE) and 0–1 loss quantify the model selection accuracy.
  • Figure 5: Selecting an optimal value of $\rho$ using the first 6 flow cytometry datasets (\ref{['sec:flow-cytometry']}). The solid lines show $\rho$ vs. F-measure for datasets 1--6. The black dashed line indicates averaged F-measure over the training datasets. The vertical black line shows the value $\rho = 1.16$ that maximizes the averaged F-measure.
  • ...and 19 more figures

Theorems & Definitions (19)

  • Definition 1: Robust model selection consistency
  • Example 1: Mixture Modeling
  • Example 2: Mixture Model with Varying Component Probabilities
  • Example 3: Probabilistic Matrix Factorization
  • Theorem 1
  • Theorem 2
  • Remark 1: Discussion of \ref{['assump:metric-discr-conditions']}
  • Remark 2: Discussion of \ref{['assump:inference-regularity']}
  • Theorem A.1
  • proof : Proof idea
  • ...and 9 more