Table of Contents
Fetching ...

Optimization Methods for Joint Eigendecomposition

Erik Troedsson, Marcus Carlsson, Herwig Wendt

TL;DR

The paper tackles joint diagonalization of a matrix collection by minimizing a nonconvex energy $f_{\mathcal{A}}(U)$ that measures off-diagonal power after a similarity transform. It derives coordinate-free gradient and Hessian expressions on the complex matrix space and exploits a forward-Hessian operator together with a multiplicative basis-change $U_{m+1}=U_m(I+\lambda_m S_m)$ to design efficient first- and second-order JD methods. The authors implement gradient descent, conjugate gradient, and (Quasi-)Newton schemes, complemented by a Hessian-informed step-size rule that avoids singularities and accelerates convergence. Numerical experiments show superior performance over state-of-the-art JD methods, particularly in high-noise or large-scale settings, and a 3D harmonic retrieval application demonstrates practical impact; code will be released to facilitate adoption.

Abstract

Joint diagonalization, the process of finding a shared set of approximate eigenvectors for a collection of matrices, arises in diverse applications such as multidimensional harmonic analysis or quantum information theory. This task is typically framed as an optimization problem: minimizing a non-convex function that quantifies off-diagonal matrix elements across possible bases. In this work, we introduce a suite of efficient algorithms designed to locate local minimizers of this functional. Our methods leverage the Hessian's structure to bypass direct computation of second-order derivatives, evaluating it as either an operator or bilinear form - a strategy that remains computationally feasible even for large-scale applications. Additionally, we demonstrate that this Hessian-based information enables precise estimation of parameters, such as step-size, in first-order optimization techniques like Gradient Descent and Conjugate Gradient, and the design of second-order methods such as (Quasi-)Newton. The resulting algorithms for joint diagonalization outperform existing techniques, and we provide comprehensive numerical evidence of their superior performance.

Optimization Methods for Joint Eigendecomposition

TL;DR

The paper tackles joint diagonalization of a matrix collection by minimizing a nonconvex energy that measures off-diagonal power after a similarity transform. It derives coordinate-free gradient and Hessian expressions on the complex matrix space and exploits a forward-Hessian operator together with a multiplicative basis-change to design efficient first- and second-order JD methods. The authors implement gradient descent, conjugate gradient, and (Quasi-)Newton schemes, complemented by a Hessian-informed step-size rule that avoids singularities and accelerates convergence. Numerical experiments show superior performance over state-of-the-art JD methods, particularly in high-noise or large-scale settings, and a 3D harmonic retrieval application demonstrates practical impact; code will be released to facilitate adoption.

Abstract

Joint diagonalization, the process of finding a shared set of approximate eigenvectors for a collection of matrices, arises in diverse applications such as multidimensional harmonic analysis or quantum information theory. This task is typically framed as an optimization problem: minimizing a non-convex function that quantifies off-diagonal matrix elements across possible bases. In this work, we introduce a suite of efficient algorithms designed to locate local minimizers of this functional. Our methods leverage the Hessian's structure to bypass direct computation of second-order derivatives, evaluating it as either an operator or bilinear form - a strategy that remains computationally feasible even for large-scale applications. Additionally, we demonstrate that this Hessian-based information enables precise estimation of parameters, such as step-size, in first-order optimization techniques like Gradient Descent and Conjugate Gradient, and the design of second-order methods such as (Quasi-)Newton. The resulting algorithms for joint diagonalization outperform existing techniques, and we provide comprehensive numerical evidence of their superior performance.

Paper Structure

This paper contains 22 sections, 4 theorems, 33 equations, 10 figures, 1 table.

Key Result

Theorem 2.1

The gradient and bilinear Hessian of functional2 are where $D_k\triangleq U^{-1}A_kU$.

Figures (10)

  • Figure 1: Top: The objective function $f_\mathcal{A}$\ref{['functional2']} for a matrix tuple $\mathcal{A}$, constructed as described in Section \ref{['sec:MC']} with SNR=$30$dB, with dimensions $n=10,K=5$ (left) and $n=10,K=10$ (right) plotted along the line $U+x S_1/\|S_1\| + y O_1$, where $S_1$ is the gradient \ref{['equ:grad:S1']} and $O_1$ is a random orthogonal direction with norm $1$. The center plots show the functional $f_{U^{-1}\mathcal{A}U}$ plotted along the line $I+x S_2/\|S_2\|+ y O_2$, where $S_2$ is the gradient \ref{['equ:grad:S2']} and $O_2$ is a random orthogonal direction with norm $1$. In both cases $U=\hbox{eig}(\sum_{k=1}^K A_k)$, which is a natural choice for initialization for the JD problem (indicated by the red circle in the graphs).
  • Figure 2: Slices of the functionals $f_\mathcal{A}$ and $f_{U^{-1}\mathcal{A}U}$, as simulated and depicted in Fig. \ref{['fig:objfun']}, along the descent directions $S_1$ and $S_2$ respectively. The dashed vertical lines indicate values of $\lambda$ for which $U+\lambda S_1$ (black) and $I+\lambda S_2$ (blue) are non-invertible, respectively; the bold solid lines at the bottom indicate the corresponding prediction given by Theorem \ref{['t2']}.
  • Figure 3: Step size selection rule. Objective function (black solid) and approximation \ref{['equ:approxStep']} (blue dashed) in the gradient direction for the additive (left) and multiplicative (right) update corresponding to the left column of Fig. \ref{['fig:objfun:sing']}.
  • Figure 4: Additive vs. multiplicative updates. Median of the objective function values versus iterations with additive \ref{['standardupdate']} and multiplicative \ref{['multupdate']} updates for gradient descent and conjugate gradient descent.
  • Figure 5: Step size selection performance. Average (over 100 realizations) objective function decrease with step size \ref{['equ:finalstep']} (top left), step size \ref{['equ:finalstep']} at each iteration divided by step size obtained with line search (bottom left); histogram of step size \ref{['equ:finalstep']} divided by step size obtained with line search (center), histogram of relative cost decrease (right) when step size \ref{['equ:finalstep']} or line search are used (initialization at $U_{init}=I$, SNR=30dB, $n=10,K=5$)
  • ...and 5 more figures

Theorems & Definitions (4)

  • Theorem 2.1
  • Theorem 2.2
  • Theorem 3.1
  • Theorem 3.2