Table of Contents
Fetching ...

Routing-Aware Explanations for Mixture of Experts Graph Models in Malware Detection

Hossein Shokouhinejad, Roozbeh Razavi-Far, Griffin Higgins, Ali. A Ghorbani

TL;DR

The results indicate that making the router explicit and combining multi-statistic node encoding with expert-level diversity can improve the transparency of MoE decisions for malware analysis.

Abstract

Mixture-of-Experts (MoE) offers flexible graph reasoning by combining multiple views of a graph through a learned router. We investigate routing-aware explanations for MoE graph models in malware detection using control flow graphs (CFGs). Our architecture builds diversity at two levels. At the node level, each layer computes multiple neighborhood statistics and fuses them with an MLP, guided by a degree reweighting factor rho and a pooling choice lambda in {mean, std, max}, producing distinct node representations that capture complementary structural cues in CFGs. At the readout level, six experts, each tied to a specific (rho, lambda) view, output graph-level logits that the router weights into a final prediction. Post-hoc explanations are generated with edge-level attributions per expert and aggregated using the router gates so the rationale reflects both what each expert highlights and how strongly it is selected. Evaluated against single-expert GNN baselines such as GCN, GIN, and GAT on the same CFG dataset, the proposed MoE achieves strong detection accuracy while yielding stable, faithful attributions under sparsity-based perturbations. The results indicate that making the router explicit and combining multi-statistic node encoding with expert-level diversity can improve the transparency of MoE decisions for malware analysis.

Routing-Aware Explanations for Mixture of Experts Graph Models in Malware Detection

TL;DR

The results indicate that making the router explicit and combining multi-statistic node encoding with expert-level diversity can improve the transparency of MoE decisions for malware analysis.

Abstract

Mixture-of-Experts (MoE) offers flexible graph reasoning by combining multiple views of a graph through a learned router. We investigate routing-aware explanations for MoE graph models in malware detection using control flow graphs (CFGs). Our architecture builds diversity at two levels. At the node level, each layer computes multiple neighborhood statistics and fuses them with an MLP, guided by a degree reweighting factor rho and a pooling choice lambda in {mean, std, max}, producing distinct node representations that capture complementary structural cues in CFGs. At the readout level, six experts, each tied to a specific (rho, lambda) view, output graph-level logits that the router weights into a final prediction. Post-hoc explanations are generated with edge-level attributions per expert and aggregated using the router gates so the rationale reflects both what each expert highlights and how strongly it is selected. Evaluated against single-expert GNN baselines such as GCN, GIN, and GAT on the same CFG dataset, the proposed MoE achieves strong detection accuracy while yielding stable, faithful attributions under sparsity-based perturbations. The results indicate that making the router explicit and combining multi-statistic node encoding with expert-level diversity can improve the transparency of MoE decisions for malware analysis.
Paper Structure (17 sections, 28 equations, 8 figures, 2 tables)

This paper contains 17 sections, 28 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: From CFG to prediction and explanation: diverse node representation, six experts, Top-k routing, and gate-weighted explanation.
  • Figure 2: The two-step node feature embedding process, including rule-based instruction encoding and autoencoder-based dimensionality reduction.
  • Figure 3: Expert co-selection analysis in the Top-2 MoE configuration, comparing the diversity of expert activation with and without the load balancing term.
  • Figure 4: Distribution of gating weights across experts in the Top-2 MoE configuration with load balancing.
  • Figure 5: Average gate weights assigned to each expert in the temperature softmax configuration with 95% confidence intervals.
  • ...and 3 more figures