Table of Contents
Fetching ...

Speeding up approximate MAP by applying domain knowledge about relevant variables

Johan Kwisthout, Andrew Schroeder

TL;DR

This work investigates whether leveraging domain knowledge about which intermediate variables are relevant can speed up approximate MAP inference in Bayesian networks, building on the Most Frugal Explanation (MFE) framework. It compares MFE variants (with on-the-fly relevance and with pre-computed relevance) against Exact MAP and Annealed MAP across four benchmarks, and also studies a hybrid MFE+A approach that embeds Annealed MAP within MFE. Results are inconclusive: while background relevance is conceptually beneficial, observed speedups are limited in practice due to implementation overheads and the efficiency of the MAP subroutine, though MFE+A can outperform some baselines on very large networks at the cost of accuracy. The findings highlight both the potential of relevance-based pruning and the practical challenges of realizing it with current MAP implementations, pointing to the need for stronger MAP solvers to fully exploit domain knowledge.

Abstract

The MAP problem in Bayesian networks is notoriously intractable, even when approximated. In an earlier paper we introduced the Most Frugal Explanation heuristic approach to solving MAP, by partitioning the set of intermediate variables (neither observed nor part of the MAP variables) into a set of relevant variables, which are marginalized out, and irrelevant variables, which will be assigned a sampled value from their domain. In this study we explore whether knowledge about which variables are relevant for a particular query (i.e., domain knowledge) speeds up computation sufficiently to beat both exact MAP as well as approximate MAP while giving reasonably accurate results. Our results are inconclusive, but also show that this probably depends on the specifics of the MAP query, most prominently the number of MAP variables.

Speeding up approximate MAP by applying domain knowledge about relevant variables

TL;DR

This work investigates whether leveraging domain knowledge about which intermediate variables are relevant can speed up approximate MAP inference in Bayesian networks, building on the Most Frugal Explanation (MFE) framework. It compares MFE variants (with on-the-fly relevance and with pre-computed relevance) against Exact MAP and Annealed MAP across four benchmarks, and also studies a hybrid MFE+A approach that embeds Annealed MAP within MFE. Results are inconclusive: while background relevance is conceptually beneficial, observed speedups are limited in practice due to implementation overheads and the efficiency of the MAP subroutine, though MFE+A can outperform some baselines on very large networks at the cost of accuracy. The findings highlight both the potential of relevance-based pruning and the practical challenges of realizing it with current MAP implementations, pointing to the need for stronger MAP solvers to fully exploit domain knowledge.

Abstract

The MAP problem in Bayesian networks is notoriously intractable, even when approximated. In an earlier paper we introduced the Most Frugal Explanation heuristic approach to solving MAP, by partitioning the set of intermediate variables (neither observed nor part of the MAP variables) into a set of relevant variables, which are marginalized out, and irrelevant variables, which will be assigned a sampled value from their domain. In this study we explore whether knowledge about which variables are relevant for a particular query (i.e., domain knowledge) speeds up computation sufficiently to beat both exact MAP as well as approximate MAP while giving reasonably accurate results. Our results are inconclusive, but also show that this probably depends on the specifics of the MAP query, most prominently the number of MAP variables.

Paper Structure

This paper contains 15 sections, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Graphical depiction of the main results for running time (RT) and error (Err) for the four approaches on four benchmark networks. In the left panel from left to right the average running time in seconds for exact MAP, Annealed MAP, MFE with sampled relevance, and MFE with pre-computed relevance (note the log scale); in the right panel the average errors.
  • Figure 2: Results for the Hailfinder network with $5$, $7$, and $10$ hypothesis nodes. In the left panel from left to right the average running time in seconds for exact MAP, Annealed MAP, MFE with sampled relevance, and MFE with pre-computed relevance (note the log-scale); in the right panel the average errors. Note that for $10$ hypothesis nodes the inefficiency of the MAP computation dominates the running times.
  • Figure 3: Comparison between Hamming distance, ratio, and rank of the explanations. Note that for distance and rank lower is better, whereas for ratio a value closer to $1$ is better.
  • Figure 4: Ratio of relevant variables out of all intermediate variables.
  • Figure 5: Graphical depiction of the main results for running time (RT) and hamming error (Err) for the four algorithms on four benchmark networks. In the left panel from left to right the average running time in seconds for exact MAP, Annealed MAP, MFE+A, and MFE with pre-computed relevance (note the log scale); in the right panel the average errors.
  • ...and 2 more figures