Table of Contents
Fetching ...

WANDER: An Explainable Decision-Support Framework for HPC

Ankur Lahiry, Banooqa Banday, Yugesh Bhattarai, Tanzima Z. Islam

TL;DR

Wander addresses the challenge of configuring heterogeneous HPC systems by unifying predictive modeling, counterfactual reasoning, and explainability within a query-driven decision-support framework. It fixes a predictive model $f: \mathcal{X} \rightarrow \mathcal{Y}$ and generates counterfactuals by optimizing a loss $cf\_loss$ that balances target validity, input proximity, and diversity, ensuring realistic, diverse configurations. The system provides interpretable explanations, uncertainty diagnostics, and causally-informed trust checks to support informed decisions across prescriptive, exploratory, and counterfactual queries. Empirical results across multiple HPC datasets demonstrate actionable, trustworthy configuration alternatives with quantified uncertainty and explicit causal reasoning, enabling safer and more efficient resource tuning in heterogeneous environments.

Abstract

High-performance computing (HPC) systems expose many interdependent configuration knobs that impact runtime, resource usage, power, and variability. Existing predictive tools model these outcomes, but do not support structured exploration, explanation, or guided reconfiguration. We present WANDER, a decision-support framework that synthesizes alternate configurations using counterfactual analysis aligned with user goals and constraints. We introduce a composite trade-off score that ranks suggestions based on prediction uncertainty, consistency between feature-target relationships using causal models, and similarity between feature distributions against historical data. To our knowledge, WANDER is the first such system to unify prediction, exploration, and explanation for HPC tuning under a common query interface. Across multiple datasets WANDER generates interpretable and trustworthy, human-readable alternatives that guide users to achieve their performance objectives.

WANDER: An Explainable Decision-Support Framework for HPC

TL;DR

Wander addresses the challenge of configuring heterogeneous HPC systems by unifying predictive modeling, counterfactual reasoning, and explainability within a query-driven decision-support framework. It fixes a predictive model and generates counterfactuals by optimizing a loss that balances target validity, input proximity, and diversity, ensuring realistic, diverse configurations. The system provides interpretable explanations, uncertainty diagnostics, and causally-informed trust checks to support informed decisions across prescriptive, exploratory, and counterfactual queries. Empirical results across multiple HPC datasets demonstrate actionable, trustworthy configuration alternatives with quantified uncertainty and explicit causal reasoning, enabling safer and more efficient resource tuning in heterogeneous environments.

Abstract

High-performance computing (HPC) systems expose many interdependent configuration knobs that impact runtime, resource usage, power, and variability. Existing predictive tools model these outcomes, but do not support structured exploration, explanation, or guided reconfiguration. We present WANDER, a decision-support framework that synthesizes alternate configurations using counterfactual analysis aligned with user goals and constraints. We introduce a composite trade-off score that ranks suggestions based on prediction uncertainty, consistency between feature-target relationships using causal models, and similarity between feature distributions against historical data. To our knowledge, WANDER is the first such system to unify prediction, exploration, and explanation for HPC tuning under a common query interface. Across multiple datasets WANDER generates interpretable and trustworthy, human-readable alternatives that guide users to achieve their performance objectives.

Paper Structure

This paper contains 31 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of Wander. Users can interact using a config.json file or Streamlit GUI to specify their dataset and query. Each query is mapped to a query template in Table \ref{['tab:query-taxonomy']}. Each query is transformed into a counterfactual sample generation scenario; generated samples are pruned using both constraint checking rules and causal model conformity; samples are evaluated based on various methods; the output of evaluation is summarize using natural language and presented as visualizations for users to understand. Wander provides a query-driven interactive decision-support environment for many stakeholders.
  • Figure 2: Results for Fugaku Q1: (a) Uncertainty Quantification, (b) UMAP projection with outlier detection, (c) Causal graph.
  • Figure 3: Results for PM100-Q2: (a) Uncertainty Quantification, (b) UMAP projection with outlier detection, (c) Causal graph.
  • Figure 4: SC19: (a) Uncertainty Quantification, (b) UMAP projection with outlier detection, (c) Causal graph.