Embedding interpretable $\ell_1$-regression into neural networks for uncovering temporal structure in cell imaging

Fabian Kabus; Maren Hackenberg; Julia Hindel; Thibault Cholvin; Antje Kilias; Thomas Brox; Abhinav Valada; Marlene Bartos; Harald Binder

Embedding interpretable $\ell_1$-regression into neural networks for uncovering temporal structure in cell imaging

Fabian Kabus, Maren Hackenberg, Julia Hindel, Thibault Cholvin, Antje Kilias, Thomas Brox, Abhinav Valada, Marlene Bartos, Harald Binder

TL;DR

This work proposes embedding a vector autoregressive (VAR) model as an interpretable regression technique into a convolutional autoencoder, which provides dimension reduction for tractable temporal modeling and contribution maps visualize which spatial regions drive the learned dynamics.

Abstract

While artificial neural networks excel in unsupervised learning of non-sparse structure, classical statistical regression techniques offer better interpretability, in particular when sparseness is enforced by $\ell_1$ regularization, enabling identification of which factors drive observed dynamics. We investigate how these two types of approaches can be optimally combined, exemplarily considering two-photon calcium imaging data where sparse autoregressive dynamics are to be extracted. We propose embedding a vector autoregressive (VAR) model as an interpretable regression technique into a convolutional autoencoder, which provides dimension reduction for tractable temporal modeling. A skip connection separately addresses non-sparse static spatial information, selectively channeling sparse structure into the $\ell_1$-regularized VAR. $\ell_1$-estimation of regression parameters is enabled by differentiating through the piecewise linear solution path. This is contrasted with approaches where the autoencoder does not adapt to the VAR model. Having an embedded statistical model also enables a testing approach for comparing temporal sequences from the same observational unit. Additionally, contribution maps visualize which spatial regions drive the learned dynamics.

Embedding interpretable $\ell_1$-regression into neural networks for uncovering temporal structure in cell imaging

TL;DR

Abstract

regularization, enabling identification of which factors drive observed dynamics. We investigate how these two types of approaches can be optimally combined, exemplarily considering two-photon calcium imaging data where sparse autoregressive dynamics are to be extracted. We propose embedding a vector autoregressive (VAR) model as an interpretable regression technique into a convolutional autoencoder, which provides dimension reduction for tractable temporal modeling. A skip connection separately addresses non-sparse static spatial information, selectively channeling sparse structure into the

-regularized VAR.

-estimation of regression parameters is enabled by differentiating through the piecewise linear solution path. This is contrasted with approaches where the autoencoder does not adapt to the VAR model. Having an embedded statistical model also enables a testing approach for comparing temporal sequences from the same observational unit. Additionally, contribution maps visualize which spatial regions drive the learned dynamics.

Paper Structure (13 sections, 8 equations, 4 figures, 2 tables)

This paper contains 13 sections, 8 equations, 4 figures, 2 tables.

Introduction
Methods
Channeling temporal structure into a VAR model
Differentiating through least angle regression for the VAR model
Statistical testing of group differences in VAR coefficients
Localizing dynamical differences via contribution maps
Results
Dataset and preparation
The skip connection improves signal-to-noise ratio in the latent space
Coefficients distinguish experimental conditions
Interpreting the sparse coefficients
End-to-end training yields a more predictable latent space
Discussion

Figures (4)

Figure 1: High-level overview of the sparse spatiotemporal dimension reduction in the end-to-end model. The frames $\bm{x}_t^{(i)}$ for all time series are first aggregated over time to form a mean frame $\bar{\bm{x}}$, which captures the static structure. The dynamic component, $\bm{x}_t^{(i)} - \bar{\bm{x}}$, is encoded into a latent representation $\bm{z}_t^{(i)}$. The latent representation $\operatorname{vec}(\bm{z}_t^{(i)})$ is modeled using a sparse vector autoregressive (VAR) model of order $p$, which forecasts $\hat{\bm{z}}_t^{(i)}$. The coefficient matrices $\bm{A}_1^{(i)}, \ldots, \bm{A}_p^{(i)}$ are fit using $\ell_1$-regression from scratch in each forward pass and they contain the learned spatiotemporal relationships. Finally, the decoder reconstructs the frame $\hat{\bm{x}}_t^{(i)}$ from $\hat{\bm{z}}_t^{(i)}$ and the static mean frame $\bar{\bm{x}}$, which is reintroduced via a skip connection.
Figure 2: Effect of the skip connection on the latent representation and reconstruction from one run at $t=263$: Top row (a--c): model without skip connection. Bottom row (d--f): model with skip connection. (a, d) Latent representation $\bm{z}_t^{(i)}$. (b, e) Decoder output $f_{\text{dec}}(\bm{z}_t^{(i)})$, brightness enhanced for visual clarity. (c) Ground truth input $\bm{x}_t^{(i)}$, contrast enhanced. (f) Reconstruction $\bm{\hat{x}}_t^{(i)}$ with skip connection, contrast enhanced.
Figure 3: Assessing the effect end-to-end learning via automatic differentiation (AD) with visual contribution maps, $\bm{\Omega}$, which illustrate the spatial origins of estimated dynamics, averaged across all runs of an exemplary mouse: (a) Familiar condition without end-to-end training. (b) Familiar condition with end-to-end training. (c) Difference map ($\bm{\Omega}^F - \bm{\Omega}^N$) without end-to-end training. (d) Difference map ($\bm{\Omega}^F - \bm{\Omega}^N$) with end-to-end training.
Figure 4: Effect of $\ell_1$ regularization parameter $\lambda$ on time series forecast and reconstruction. (a) Selected dimensions of $\bm{z}_t$ in black (reference) and their forecasts $\hat{\bm{z}}_t$ for varying $\lambda$ (in colors) over time. (b--d) Reconstructions at $t=263$: (b) Ground truth $\bm{x}_t$. (c) Reconstruction $\hat{\bm{x}}_t$ for $\lambda = 0.003$. (d) Reconstruction $\hat{\bm{x}}_t$ for $\lambda = 0.032$.

Embedding interpretable $\ell_1$-regression into neural networks for uncovering temporal structure in cell imaging

TL;DR

Abstract

Embedding interpretable $\ell_1$-regression into neural networks for uncovering temporal structure in cell imaging

Authors

TL;DR

Abstract

Table of Contents

Figures (4)