Table of Contents
Fetching ...

Graph Unfolding and Sampling for Transitory Video Summarization via Gershgorin Disc Alignment

Sadid Sahami, Gene Cheung, Chia-Wen Lin

TL;DR

This work summarizes a transitory UGV into several keyframes in linear time via fast graph sampling based on Gershgorin disc alignment (GDA) via a new fast graph sampling algorithm that iteratively aligns left-ends of Gershgorin discs for all graph nodes (frames).

Abstract

User-generated videos (UGVs) uploaded from mobile phones to social media sites like YouTube and TikTok are short and non-repetitive. We summarize a transitory UGV into several keyframes in linear time via fast graph sampling based on Gershgorin disc alignment (GDA). Specifically, we first model a sequence of $N$ frames in a UGV as an $M$-hop path graph $\mathcal{G}^o$ for $M \ll N$, where the similarity between two frames within $M$ time instants is encoded as a positive edge based on feature similarity. Towards efficient sampling, we then "unfold" $\mathcal{G}^o$ to a $1$-hop path graph $\mathcal{G}$, specified by a generalized graph Laplacian matrix $\mathcal{L}$, via one of two graph unfolding procedures with provable performance bounds. We show that maximizing the smallest eigenvalue $λ_{\min}(\mathbf{B})$ of a coefficient matrix $\mathbf{B} = \textit{diag}\left(\mathbf{h}\right) + μ\mathcal{L}$, where $\mathbf{h}$ is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We maximize instead the Gershgorin circle theorem (GCT) lower bound $λ^-_{\min}(\mathbf{B})$ by choosing $\mathbf{h}$ via a new fast graph sampling algorithm that iteratively aligns left-ends of Gershgorin discs for all graph nodes (frames). Extensive experiments on multiple short video datasets show that our algorithm achieves comparable or better video summarization performance compared to state-of-the-art methods, at a substantially reduced complexity.

Graph Unfolding and Sampling for Transitory Video Summarization via Gershgorin Disc Alignment

TL;DR

This work summarizes a transitory UGV into several keyframes in linear time via fast graph sampling based on Gershgorin disc alignment (GDA) via a new fast graph sampling algorithm that iteratively aligns left-ends of Gershgorin discs for all graph nodes (frames).

Abstract

User-generated videos (UGVs) uploaded from mobile phones to social media sites like YouTube and TikTok are short and non-repetitive. We summarize a transitory UGV into several keyframes in linear time via fast graph sampling based on Gershgorin disc alignment (GDA). Specifically, we first model a sequence of frames in a UGV as an -hop path graph for , where the similarity between two frames within time instants is encoded as a positive edge based on feature similarity. Towards efficient sampling, we then "unfold" to a -hop path graph , specified by a generalized graph Laplacian matrix , via one of two graph unfolding procedures with provable performance bounds. We show that maximizing the smallest eigenvalue of a coefficient matrix , where is the binary keyframe selection vector, is equivalent to minimizing a worst-case signal reconstruction error. We maximize instead the Gershgorin circle theorem (GCT) lower bound by choosing via a new fast graph sampling algorithm that iteratively aligns left-ends of Gershgorin discs for all graph nodes (frames). Extensive experiments on multiple short video datasets show that our algorithm achieves comparable or better video summarization performance compared to state-of-the-art methods, at a substantially reduced complexity.
Paper Structure (29 sections, 7 theorems, 38 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 29 sections, 7 theorems, 38 equations, 5 figures, 4 tables, 1 algorithm.

Key Result

Theorem 1

Given a graph ${\mathcal{G}}^o$ specified by generalized graph Laplacian ${\mathbf L}^o$, to replace an edge $(i,j)$ of weight $w^o_{i,j} > 0$ connecting nodes $i$ and $j$ in ${\mathcal{G}}^o$, a procedure that adds edges $(i,k)$ and $(k,j)$ to/from intermediate node $k$ in graph ${\mathcal{G}}$, ea

Figures (5)

  • Figure 1: Illustration of the $M$-EPG graph (a) and its unfolded versions: (b) Graph Unfolding 1 corresponding to $\beta=2$, and (c) Graph Unfolding 2 corresponding to $\beta=0$
  • Figure 2: Graphical representation of Theorem \ref{['thm:graph_transform']}: Replacing a (dashed red) edge $(i,j)$ with weight $w^o_{i,j}$ in the original graph ${\mathcal{G}}^o$ with (solid green) edges $(i,j-1)$ and $(j-1,j)$ and self-loops at nodes $i$, $j-1$ and $j$ of different weights, where $j-1$ is an intermediate node.
  • Figure 3: Illustration of Shifting and Scaling operations in Gershgorin Disk Alignment. (Initial): The example shows a simple path graph with $4$ nodes denoted by $v_1,v_2,\dots,v_4$, connected by edges with weights $w_{i,i+1}=0.5$, $\forall i=1,2,\dots,3$. The matrix represented here is $\mathbf{B}=\text{diag}\mspace{-3mu}\left(\mathbf{h}\right) + \mu\mathbf{L}$, where $\mathbf{L}$ denotes the Laplacian matrix. Initially, all disc left-ends are located at $0$. (Shift): Upon sampling node $v_3$, its corresponding disc, $\mathcal{D}_3$, is shifted right by $1$. (Scale): The application of scalar $s_3=1.9$ causes $\mathcal{D}_3$'s left-end to align at $T=0.1$, while the reciprocal scalars $1/s_3$ decrease the radii of $\mathcal{D}_2$ and $\mathcal{D}_4$ subsequently.
  • Figure 4: Illustration of upstream and Downstream procedures in an SPG graph. The algorithm iteratively computes the scalars $s^u_i$ and $s^l_i$ until $s^u_i < s^l_i$, identifying the sampled node ($k=3$). Subsequently, the downstream procedure computes the coverage of sampled node $k$ onwards. Covered nodes are then removed from the graph, allowing the sampling algorithm to proceed with the remaining SPG.
  • Figure 5: MSE reconstruction performance of our graph sampling algorithm for different graph signals defined on $2$-EPG graphs using GLR-based signal reconstruction in comparison to GDAS bai2020fast and Ed-free sakiyama2019edfree. (\ref{['fig:mse-a']})-(\ref{['fig:mse-b']}) show results for $\lfloor \frac{N}{20} \rfloor$-BL graph signals with two different noise levels: noise-free and $20$dB, respectively. (\ref{['fig:mse-c']})-(\ref{['fig:mse-d']}) show results for GMRF graph signals with two different noise levels: noise-free, and $20$dB, respectively.

Theorems & Definitions (12)

  • Theorem 1
  • proof
  • Corollary 1
  • Corollary 2
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Corollary 3
  • proof
  • ...and 2 more