Table of Contents
Fetching ...

I Know What I Don't Know: Latent Posterior Factor Models for Multi-Evidence Probabilistic Reasoning

Aliyu Agboola Alege

Abstract

Real-world decision-making, from tax compliance assessment to medical diagnosis, requires aggregating multiple noisy and potentially contradictory evidence sources. Existing approaches either lack explicit uncertainty quantification (neural aggregation methods) or rely on manually engineered discrete predicates (probabilistic logic frameworks), limiting scalability to unstructured data. We introduce Latent Posterior Factors (LPF), a framework that transforms Variational Autoencoder (VAE) latent posteriors into soft likelihood factors for Sum-Product Network (SPN) inference, enabling tractable probabilistic reasoning over unstructured evidence while preserving calibrated uncertainty estimates. We instantiate LPF as LPF-SPN (structured factor-based inference) and LPF-Learned (end-to-end learned aggregation), enabling a principled comparison between explicit probabilistic reasoning and learned aggregation under a shared uncertainty representation. Across eight domains (seven synthetic and the FEVER benchmark), LPF-SPN achieves high accuracy (up to 97.8%), low calibration error (ECE 1.4%), and strong probabilistic fit, substantially outperforming evidential deep learning, LLMs and graph-based baselines over 15 random seeds. Contributions: (1) A framework bridging latent uncertainty representations with structured probabilistic reasoning. (2) Dual architectures enabling controlled comparison of reasoning paradigms. (3) Reproducible training methodology with seed selection. (4) Evaluation against EDL, BERT, R-GCN, and large language model baselines. (5) Cross-domain validation. (6) Formal guarantees in a companion paper.

I Know What I Don't Know: Latent Posterior Factor Models for Multi-Evidence Probabilistic Reasoning

Abstract

Real-world decision-making, from tax compliance assessment to medical diagnosis, requires aggregating multiple noisy and potentially contradictory evidence sources. Existing approaches either lack explicit uncertainty quantification (neural aggregation methods) or rely on manually engineered discrete predicates (probabilistic logic frameworks), limiting scalability to unstructured data. We introduce Latent Posterior Factors (LPF), a framework that transforms Variational Autoencoder (VAE) latent posteriors into soft likelihood factors for Sum-Product Network (SPN) inference, enabling tractable probabilistic reasoning over unstructured evidence while preserving calibrated uncertainty estimates. We instantiate LPF as LPF-SPN (structured factor-based inference) and LPF-Learned (end-to-end learned aggregation), enabling a principled comparison between explicit probabilistic reasoning and learned aggregation under a shared uncertainty representation. Across eight domains (seven synthetic and the FEVER benchmark), LPF-SPN achieves high accuracy (up to 97.8%), low calibration error (ECE 1.4%), and strong probabilistic fit, substantially outperforming evidential deep learning, LLMs and graph-based baselines over 15 random seeds. Contributions: (1) A framework bridging latent uncertainty representations with structured probabilistic reasoning. (2) Dual architectures enabling controlled comparison of reasoning paradigms. (3) Reproducible training methodology with seed selection. (4) Evaluation against EDL, BERT, R-GCN, and large language model baselines. (5) Cross-domain validation. (6) Formal guarantees in a companion paper.
Paper Structure (379 sections, 5 theorems, 72 equations, 52 figures, 105 tables)

This paper contains 379 sections, 5 theorems, 72 equations, 52 figures, 105 tables.

Key Result

Theorem 5.1

For a Bernoulli random variable, the variance of the MC estimator is bounded by:

Figures (52)

  • Figure 1: Pipeline: Evidence through VAE to soft factors and probabilistic reasoning.
  • Figure 2: Tree structure showing Root (Sum) node splitting into two Product nodes (w1, w2), each splitting into two Leaf nodes over variables V1 and V2.
  • Figure 3: (System Overview) illustrates the complete pipeline from user query through canonical database check, evidence retrieval, VAE encoding, factor conversion, and SPN reasoning to final output with provenance.
  • Figure 4: Evidence Index Architecture illustrating the two-tier indexing strategy: an Entity-Predicate Hash Index and FAISS Vector Store feeding into a central Metadata Store, with the query flow from entity lookup through optional semantic reranking to top-$k$ evidence retrieval.
  • Figure 5: Monte Carlo Decoding: evidence flows through the encoder to produce $(\mu, \sigma)$, multiple latent codes are sampled via reparameterization, each is decoded to a distribution, and the results are averaged and temperature-scaled to produce the final soft factor.
  • ...and 47 more figures

Theorems & Definitions (7)

  • Theorem 5.1: MC Variance Bound
  • Theorem 5.2: SPN Consistency
  • proof
  • Theorem 5.3: Aggregator Optimality
  • proof
  • Theorem 5.4: Contradiction Handling
  • Theorem 5.5: MC Convergence