Table of Contents
Fetching ...

Embedding interpretable $\ell_1$-regression into neural networks for uncovering temporal structure in cell imaging

Fabian Kabus, Maren Hackenberg, Julia Hindel, Thibault Cholvin, Antje Kilias, Thomas Brox, Abhinav Valada, Marlene Bartos, Harald Binder

TL;DR

This work proposes embedding a vector autoregressive (VAR) model as an interpretable regression technique into a convolutional autoencoder, which provides dimension reduction for tractable temporal modeling and contribution maps visualize which spatial regions drive the learned dynamics.

Abstract

While artificial neural networks excel in unsupervised learning of non-sparse structure, classical statistical regression techniques offer better interpretability, in particular when sparseness is enforced by $\ell_1$ regularization, enabling identification of which factors drive observed dynamics. We investigate how these two types of approaches can be optimally combined, exemplarily considering two-photon calcium imaging data where sparse autoregressive dynamics are to be extracted. We propose embedding a vector autoregressive (VAR) model as an interpretable regression technique into a convolutional autoencoder, which provides dimension reduction for tractable temporal modeling. A skip connection separately addresses non-sparse static spatial information, selectively channeling sparse structure into the $\ell_1$-regularized VAR. $\ell_1$-estimation of regression parameters is enabled by differentiating through the piecewise linear solution path. This is contrasted with approaches where the autoencoder does not adapt to the VAR model. Having an embedded statistical model also enables a testing approach for comparing temporal sequences from the same observational unit. Additionally, contribution maps visualize which spatial regions drive the learned dynamics.

Embedding interpretable $\ell_1$-regression into neural networks for uncovering temporal structure in cell imaging

TL;DR

This work proposes embedding a vector autoregressive (VAR) model as an interpretable regression technique into a convolutional autoencoder, which provides dimension reduction for tractable temporal modeling and contribution maps visualize which spatial regions drive the learned dynamics.

Abstract

While artificial neural networks excel in unsupervised learning of non-sparse structure, classical statistical regression techniques offer better interpretability, in particular when sparseness is enforced by regularization, enabling identification of which factors drive observed dynamics. We investigate how these two types of approaches can be optimally combined, exemplarily considering two-photon calcium imaging data where sparse autoregressive dynamics are to be extracted. We propose embedding a vector autoregressive (VAR) model as an interpretable regression technique into a convolutional autoencoder, which provides dimension reduction for tractable temporal modeling. A skip connection separately addresses non-sparse static spatial information, selectively channeling sparse structure into the -regularized VAR. -estimation of regression parameters is enabled by differentiating through the piecewise linear solution path. This is contrasted with approaches where the autoencoder does not adapt to the VAR model. Having an embedded statistical model also enables a testing approach for comparing temporal sequences from the same observational unit. Additionally, contribution maps visualize which spatial regions drive the learned dynamics.
Paper Structure (13 sections, 8 equations, 4 figures, 2 tables)

This paper contains 13 sections, 8 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: High-level overview of the sparse spatiotemporal dimension reduction in the end-to-end model. The frames $\bm{x}_t^{(i)}$ for all time series are first aggregated over time to form a mean frame $\bar{\bm{x}}$, which captures the static structure. The dynamic component, $\bm{x}_t^{(i)} - \bar{\bm{x}}$, is encoded into a latent representation $\bm{z}_t^{(i)}$. The latent representation $\operatorname{vec}(\bm{z}_t^{(i)})$ is modeled using a sparse vector autoregressive (VAR) model of order $p$, which forecasts $\hat{\bm{z}}_t^{(i)}$. The coefficient matrices $\bm{A}_1^{(i)}, \ldots, \bm{A}_p^{(i)}$ are fit using $\ell_1$-regression from scratch in each forward pass and they contain the learned spatiotemporal relationships. Finally, the decoder reconstructs the frame $\hat{\bm{x}}_t^{(i)}$ from $\hat{\bm{z}}_t^{(i)}$ and the static mean frame $\bar{\bm{x}}$, which is reintroduced via a skip connection.
  • Figure 2: Effect of the skip connection on the latent representation and reconstruction from one run at $t=263$: Top row (a--c): model without skip connection. Bottom row (d--f): model with skip connection. (a, d) Latent representation $\bm{z}_t^{(i)}$. (b, e) Decoder output $f_{\text{dec}}(\bm{z}_t^{(i)})$, brightness enhanced for visual clarity. (c) Ground truth input $\bm{x}_t^{(i)}$, contrast enhanced. (f) Reconstruction $\bm{\hat{x}}_t^{(i)}$ with skip connection, contrast enhanced.
  • Figure 3: Assessing the effect end-to-end learning via automatic differentiation (AD) with visual contribution maps, $\bm{\Omega}$, which illustrate the spatial origins of estimated dynamics, averaged across all runs of an exemplary mouse: (a) Familiar condition without end-to-end training. (b) Familiar condition with end-to-end training. (c) Difference map ($\bm{\Omega}^F - \bm{\Omega}^N$) without end-to-end training. (d) Difference map ($\bm{\Omega}^F - \bm{\Omega}^N$) with end-to-end training.
  • Figure 4: Effect of $\ell_1$ regularization parameter $\lambda$ on time series forecast and reconstruction. (a) Selected dimensions of $\bm{z}_t$ in black (reference) and their forecasts $\hat{\bm{z}}_t$ for varying $\lambda$ (in colors) over time. (b--d) Reconstructions at $t=263$: (b) Ground truth $\bm{x}_t$. (c) Reconstruction $\hat{\bm{x}}_t$ for $\lambda = 0.003$. (d) Reconstruction $\hat{\bm{x}}_t$ for $\lambda = 0.032$.