Table of Contents
Fetching ...

Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods

Felix Dangel

TL;DR

This work simplifies convolutions by viewing them as tensor networks that allow reasoning about the underlying tensor multiplications by drawing diagrams, manipulating them to perform function transformations like differentiation, and efficiently evaluating them with einsum.

Abstract

Despite their simple intuition, convolutions are more tedious to analyze than dense layers, which complicates the transfer of theoretical and algorithmic ideas to convolutions. We simplify convolutions by viewing them as tensor networks (TNs) that allow reasoning about the underlying tensor multiplications by drawing diagrams, manipulating them to perform function transformations like differentiation, and efficiently evaluating them with einsum. To demonstrate their simplicity and expressiveness, we derive diagrams of various autodiff operations and popular curvature approximations with full hyper-parameter support, batching, channel groups, and generalization to any convolution dimension. Further, we provide convolution-specific transformations based on the connectivity pattern which allow to simplify diagrams before evaluation. Finally, we probe performance. Our TN implementation accelerates a recently-proposed KFAC variant up to 4.5x while removing the standard implementation's memory overhead, and enables new hardware-efficient tensor dropout for approximate backpropagation.

Convolutions and More as Einsum: A Tensor Network Perspective with Advances for Second-Order Methods

TL;DR

This work simplifies convolutions by viewing them as tensor networks that allow reasoning about the underlying tensor multiplications by drawing diagrams, manipulating them to perform function transformations like differentiation, and efficiently evaluating them with einsum.

Abstract

Despite their simple intuition, convolutions are more tedious to analyze than dense layers, which complicates the transfer of theoretical and algorithmic ideas to convolutions. We simplify convolutions by viewing them as tensor networks (TNs) that allow reasoning about the underlying tensor multiplications by drawing diagrams, manipulating them to perform function transformations like differentiation, and efficiently evaluating them with einsum. To demonstrate their simplicity and expressiveness, we derive diagrams of various autodiff operations and popular curvature approximations with full hyper-parameter support, batching, channel groups, and generalization to any convolution dimension. Further, we provide convolution-specific transformations based on the connectivity pattern which allow to simplify diagrams before evaluation. Finally, we probe performance. Our TN implementation accelerates a recently-proposed KFAC variant up to 4.5x while removing the standard implementation's memory overhead, and enables new hardware-efficient tensor dropout for approximate backpropagation.
Paper Structure (63 sections, 36 equations, 23 figures, 10 tables, 1 algorithm)

This paper contains 63 sections, 36 equations, 23 figures, 10 tables, 1 algorithm.

Figures (23)

  • Figure 1: Many convolution-related routines can be expressed as TNs and evaluated with einsum. We illustrate this for the input-based factor of KFAC for convolutions grosse2016kroneckerfactored, whose standard implementation (top) requires unfolding the input (high memory). The TN (middle) enables internal optimizations inside einsum (e.g. with contraction path optimizers like opt_einsumsmith2018opteinsum). (Bottom) In many cases, the TN further simplifies due to structures in the index pattern, which reduces cost.
  • Figure 2: TNs of (\ref{['subfig:visual-abstract-convolution']}) 2d convolution and (\ref{['subfig:visual-abstract-unfolded-input']},\ref{['subfig:visual-abstract-unfolded-kernel']}) connections to its matrix multiplication view. The connectivity along each dimension is explicit via an index pattern tensor $\bm{\mathsf{\Pi}}$.
  • Figure 3: TN differentiation as graphical manipulation. (\ref{['subfig:example-tensor-network-derivative-delta']}) Differentiating convolution w.r.t. ${\bm{\mathsfit{W}}}$ is cutting it out of the diagram and yields the weight Jacobian. (\ref{['subfig:example-input-jacobian']}) Same procedure applied to the Jacobian w.r.t. ${\bm{\mathsfit{X}}}$. (\ref{['subfig:weight-vjp']}) VJP for the weight and (\ref{['subfig:input-vjp']}) input Jacobian (transpose convolution). Jacobians are shaded, only their contraction with ${{\bm{\mathsfit{V}}}^{({\bm{\mathsfit{Y}}})}}$ is highlighted.
  • Figure 4: TNs of input-based Kronecker factors for KFAC approximations of the Fisher/GGN (no batching, no groups). The unfolded input is shaded, only additional contractions are highlighted. (\ref{['subfig:kfc-factor']}) ${\bm{\Omega}}$ (KFC/KFAC-expand) from grosse2016kroneckerfactored, (\ref{['subfig:kfac-reduce-factor']}) $\hat{{\bm{\Omega}}}$ (KFAC-reduce) from eschenhagen2023kroneckerfactored (vectors of ones effectively amount to sums).
  • Figure 5: TN illustrations of index pattern simplifications and transformations. See \ref{['subsec:app-additional-properties']} for the math formulation.
  • ...and 18 more figures