Table of Contents
Fetching ...

SuperMAN: Interpretable and Expressive Networks over Temporally Sparse Heterogeneous Data

Maya Bechler-Speicher, Andrea Zerio, Maor Huri, Marie Vibeke Vestergaard, Ran Gilad-Bachrach, Tine Jess, Samir Bhatt, Aleksejs Sazonovs

TL;DR

SuperMAN introduces a novel framework for learning from sets of sparse, irregular temporal signals by representing each signal type as an implicit graph and aggregating across a graph set with signal-grouping. Its ExtGNAN component enables multivariate processing within signal groups, while the additive structure preserves interpretability at node, graph, and subset levels; grouping priors can increase expressivity when domain knowledge is available. The method achieves state-of-the-art results in high-stakes medical tasks (Crohn's onset, ICU length of stay) and fake-news detection, and its interpretability analyses yield clinically meaningful insights such as phase-transition detection and system-level biomarker contributions. Theoretical results establish that SuperMAN is strictly more expressive than GNAN and that grouping increases expressivity, with empirical demonstrations across domains and ablations confirming the value of its components and interpretability commitments.

Abstract

Real-world temporal data often consists of multiple signal types recorded at irregular, asynchronous intervals. For instance, in the medical domain, different types of blood tests can be measured at different times and frequencies, resulting in fragmented and unevenly scattered temporal data. Similar issues of irregular sampling occur in other domains, such as the monitoring of large systems using event log files. Effectively learning from such data requires handling sets of temporal sparse and heterogeneous signals. In this work, we propose Super Mixing Additive Networks (SuperMAN), a novel and interpretable-by-design framework for learning directly from such heterogeneous signals, by modeling them as sets of implicit graphs. SuperMAN provides diverse interpretability capabilities, including node-level, graph-level, and subset-level importance, and enables practitioners to trade finer-grained interpretability for greater expressivity when domain priors are available. SuperMAN achieves state-of-the-art performance in real-world high-stakes tasks, including predicting Crohn's disease onset and hospital length of stay from routine blood test measurements and detecting fake news. Furthermore, we demonstrate how SuperMAN's interpretability properties assist in revealing disease development phase transitions and provide crucial insights in the healthcare domain.

SuperMAN: Interpretable and Expressive Networks over Temporally Sparse Heterogeneous Data

TL;DR

SuperMAN introduces a novel framework for learning from sets of sparse, irregular temporal signals by representing each signal type as an implicit graph and aggregating across a graph set with signal-grouping. Its ExtGNAN component enables multivariate processing within signal groups, while the additive structure preserves interpretability at node, graph, and subset levels; grouping priors can increase expressivity when domain knowledge is available. The method achieves state-of-the-art results in high-stakes medical tasks (Crohn's onset, ICU length of stay) and fake-news detection, and its interpretability analyses yield clinically meaningful insights such as phase-transition detection and system-level biomarker contributions. Theoretical results establish that SuperMAN is strictly more expressive than GNAN and that grouping increases expressivity, with empirical demonstrations across domains and ablations confirming the value of its components and interpretability commitments.

Abstract

Real-world temporal data often consists of multiple signal types recorded at irregular, asynchronous intervals. For instance, in the medical domain, different types of blood tests can be measured at different times and frequencies, resulting in fragmented and unevenly scattered temporal data. Similar issues of irregular sampling occur in other domains, such as the monitoring of large systems using event log files. Effectively learning from such data requires handling sets of temporal sparse and heterogeneous signals. In this work, we propose Super Mixing Additive Networks (SuperMAN), a novel and interpretable-by-design framework for learning directly from such heterogeneous signals, by modeling them as sets of implicit graphs. SuperMAN provides diverse interpretability capabilities, including node-level, graph-level, and subset-level importance, and enables practitioners to trade finer-grained interpretability for greater expressivity when domain priors are available. SuperMAN achieves state-of-the-art performance in real-world high-stakes tasks, including predicting Crohn's disease onset and hospital length of stay from routine blood test measurements and detecting fake news. Furthermore, we demonstrate how SuperMAN's interpretability properties assist in revealing disease development phase transitions and provide crucial insights in the healthcare domain.

Paper Structure

This paper contains 55 sections, 3 theorems, 19 equations, 5 figures, 8 tables.

Key Result

Theorem 3.1

SuperMAN is strictly more expressive than GNAN.

Figures (5)

  • Figure 1: In this example, the input is a set of three graphs, $G_1, G_2, G_3$, grouped into two subsets $S_1$ and $S_2$. Within each subset, the same ExtGNAN instance is applied to all graphs to produce their graph-level representations. For subsets containing multiple graphs, a DeepSets module aggregates these graph representations into a single subset representation. For subsets of size one, the subset representation is simply the graph representation itself. The final set representation is then obtained by summing the subset representations, and the final label prediction is produced by summing the entries of this set representation.
  • Figure 2: Node-level importances for two individuals from (a) the P12 ICU LoS and (b) CD onset datasets. Node size indicates the exact node (measurement) contribution to the prediction.
  • Figure 3: Subset-level contribution curves for Crohn’s Disease prediction. Each curve shows how the SuperMAN's output changes as increasing noise is added to the latent representation of a biomarker group. (a) uses individual biomarkers; (b) uses physiologically coherent groups.
  • Figure 4: Node importance for fake-news spread graphs, over the GOS dataset. The node size corresponds to its importance learned by SuperMAN, according to \ref{['eq:node_importance']}. All graphs with a single node are grouped into one subset. Therefore, the importance is provided on the subset level rather than the node level (green node).
  • Figure 5: Q--Q calibration plot (reliability diagram) for SuperMAN on the CD task.

Theorems & Definitions (4)

  • Theorem 3.1
  • Theorem 3.2
  • Theorem B.1
  • proof