Discovering interpretable models of scientific image data with deep learning

Christopher J. Soelistyo; Alan R. Lowe

Discovering interpretable models of scientific image data with deep learning

Christopher J. Soelistyo, Alan R. Lowe

TL;DR

This work tackles the problem of extracting interpretable, domain-appropriate scientific models from raw image data by marrying disentangled representation learning, sparse network training, and symbolic regression. The authors test these methods on a bioimaging problem—classifying chromatin morphology in live-cell microscopy—demonstrating that highly parsimonious models can approach the accuracy of black-box benchmarks while offering clear interpretability and domain insight. Key findings include a semantic latent space that aligns with biological factors, sparse models that illuminate which latent features drive decisions, and symbolic expressions that reveal explicit decision boundaries, all validated against adversarial perturbations to assess domain-appropriateness. The results suggest that an approximate Rashomon set of models exists in this domain, enabling a practical, interpretable discovery system with potential broad applicability in scientific contexts.

Abstract

How can we find interpretable, domain-appropriate models of natural phenomena given some complex, raw data such as images? Can we use such models to derive scientific insight from the data? In this paper, we propose some methods for achieving this. In particular, we implement disentangled representation learning, sparse deep neural network training and symbolic regression, and assess their usefulness in forming interpretable models of complex image data. We demonstrate their relevance to the field of bioimaging using a well-studied test problem of classifying cell states in microscopy data. We find that such methods can produce highly parsimonious models that achieve $\sim98\%$ of the accuracy of black-box benchmark models, with a tiny fraction of the complexity. We explore the utility of such interpretable models in producing scientific explanations of the underlying biological phenomenon.

Discovering interpretable models of scientific image data with deep learning

TL;DR

Abstract

of the accuracy of black-box benchmark models, with a tiny fraction of the complexity. We explore the utility of such interpretable models in producing scientific explanations of the underlying biological phenomenon.

Paper Structure (39 sections, 12 equations, 27 figures, 3 tables, 1 algorithm)

This paper contains 39 sections, 12 equations, 27 figures, 3 tables, 1 algorithm.

Introduction
Background
The promise of deep learning in scientific discovery
The dangers of deep learning in scientific discovery
The third way: an ideal discovery system
Bioimaging as an ideal test domain
Prior work
Goal and strategy
The test problem: classifying chromatin morphology in live-cell microscopy data
The strategy
Methods
Total Correlation VAE
Sparsity: RigL
Symbolic regression
Adversarial attacks
...and 24 more sections

Figures (27)

Figure 1: Fundamental concepts of the study. (a) "Rashomon set" concept. (b) The "sweet spot" of useful scientific models, at the intersection of the performant, interpretable and domain-appropriate model subspaces within the overall model space.
Figure 2: Example images for cells in interphase (top) and metaphase (bottom).
Figure 3: Example reconstructions produced by the $\beta$-TCVAE.
Figure 4: Latent variables that encode central cell morphology.
Figure 5: Topology of the highest-performing Scheme 3 model. Blue connections are positively weighted, red connections are negatively weighted. The thickness of the connection line is proportional to its weight magnitude. Bias values are written above their respective neuron. The network flow is up to down, so the top layer is the input layer and the bottom layer is the output layer.
...and 22 more figures

Discovering interpretable models of scientific image data with deep learning

TL;DR

Abstract

Discovering interpretable models of scientific image data with deep learning

Authors

TL;DR

Abstract

Table of Contents

Figures (27)