Table of Contents
Fetching ...

Conditional Deep Canonical Time Warping

Afek Steinberg, Ran Eisenberg, Ofir Lindenbaum

TL;DR

This work tackles temporal alignment of high-dimensional, sparse sequences by introducing Conditional Deep Canonical Time Warping (CDCTW), which combines context-aware feature selection via conditional stochastic gates with deep canonical projections and DTW-based alignment. By conditioning feature gates on contextual temporal information through hypernetworks, CDCTW dynamically selects relevant features and learns a maximally correlated embedding to drive accurate warping. The approach extends $ ext{CCA}$ and its nonlinear variants with unsupervised sparsity through an $\ell_0$-style objective, achieving state-of-the-art alignment on synthetic and real multimodal datasets. Empirical results show that CDCTW is robust to noise and high dimensionality, offering practical benefits for precise temporal alignment in computer vision and related domains.

Abstract

Temporal alignment of sequences is a fundamental challenge in many applications, such as computer vision and bioinformatics, where local time shifting needs to be accounted for. Misalignment can lead to poor model generalization, especially in high-dimensional sequences. Existing methods often struggle with optimization when dealing with high-dimensional sparse data, falling into poor alignments. Feature selection is frequently used to enhance model performance for sparse data. However, a fixed set of selected features would not generally work for dynamically changing sequences and would need to be modified based on the state of the sequence. Therefore, modifying the selected feature based on contextual input would result in better alignment. Our suggested method, Conditional Deep Canonical Temporal Time Warping (CDCTW), is designed for temporal alignment in sparse temporal data to address these challenges. CDCTW enhances alignment accuracy for high dimensional time-dependent views be performing dynamic time warping on data embedded in maximally correlated subspace which handles sparsity with novel feature selection method. We validate the effectiveness of CDCTW through extensive experiments on various datasets, demonstrating superior performance over previous techniques.

Conditional Deep Canonical Time Warping

TL;DR

This work tackles temporal alignment of high-dimensional, sparse sequences by introducing Conditional Deep Canonical Time Warping (CDCTW), which combines context-aware feature selection via conditional stochastic gates with deep canonical projections and DTW-based alignment. By conditioning feature gates on contextual temporal information through hypernetworks, CDCTW dynamically selects relevant features and learns a maximally correlated embedding to drive accurate warping. The approach extends and its nonlinear variants with unsupervised sparsity through an -style objective, achieving state-of-the-art alignment on synthetic and real multimodal datasets. Empirical results show that CDCTW is robust to noise and high dimensionality, offering practical benefits for precise temporal alignment in computer vision and related domains.

Abstract

Temporal alignment of sequences is a fundamental challenge in many applications, such as computer vision and bioinformatics, where local time shifting needs to be accounted for. Misalignment can lead to poor model generalization, especially in high-dimensional sequences. Existing methods often struggle with optimization when dealing with high-dimensional sparse data, falling into poor alignments. Feature selection is frequently used to enhance model performance for sparse data. However, a fixed set of selected features would not generally work for dynamically changing sequences and would need to be modified based on the state of the sequence. Therefore, modifying the selected feature based on contextual input would result in better alignment. Our suggested method, Conditional Deep Canonical Temporal Time Warping (CDCTW), is designed for temporal alignment in sparse temporal data to address these challenges. CDCTW enhances alignment accuracy for high dimensional time-dependent views be performing dynamic time warping on data embedded in maximally correlated subspace which handles sparsity with novel feature selection method. We validate the effectiveness of CDCTW through extensive experiments on various datasets, demonstrating superior performance over previous techniques.

Paper Structure

This paper contains 11 sections, 9 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The proposed CDCTW architecture: Inputs $\mathcal{T}_x$ and $\mathcal{T}_y$ are plugged in to the conditional networks $\phi$ and $\psi$ correspondingly. Gate values $\hbox{\boldmath $z$}_x$ and $\hbox{\boldmath $z$}_y$ are then used to modify the inputs $\hbox{\boldmath $X$}$ and $\hbox{\boldmath $Y$}$ as described in section \ref{['sec:method']}. The modified input is fed into $\mathbf{f}$ and $\mathbf{g}$, which produce suitable embeddings for dynamic time wrapping.
  • Figure 2: Moving MNIST: (a) A sample MNIST digit on a black background with added noise; (b) The corresponding learned gates.