Conditional Deep Canonical Time Warping
Afek Steinberg, Ran Eisenberg, Ofir Lindenbaum
TL;DR
This work tackles temporal alignment of high-dimensional, sparse sequences by introducing Conditional Deep Canonical Time Warping (CDCTW), which combines context-aware feature selection via conditional stochastic gates with deep canonical projections and DTW-based alignment. By conditioning feature gates on contextual temporal information through hypernetworks, CDCTW dynamically selects relevant features and learns a maximally correlated embedding to drive accurate warping. The approach extends $ ext{CCA}$ and its nonlinear variants with unsupervised sparsity through an $\ell_0$-style objective, achieving state-of-the-art alignment on synthetic and real multimodal datasets. Empirical results show that CDCTW is robust to noise and high dimensionality, offering practical benefits for precise temporal alignment in computer vision and related domains.
Abstract
Temporal alignment of sequences is a fundamental challenge in many applications, such as computer vision and bioinformatics, where local time shifting needs to be accounted for. Misalignment can lead to poor model generalization, especially in high-dimensional sequences. Existing methods often struggle with optimization when dealing with high-dimensional sparse data, falling into poor alignments. Feature selection is frequently used to enhance model performance for sparse data. However, a fixed set of selected features would not generally work for dynamically changing sequences and would need to be modified based on the state of the sequence. Therefore, modifying the selected feature based on contextual input would result in better alignment. Our suggested method, Conditional Deep Canonical Temporal Time Warping (CDCTW), is designed for temporal alignment in sparse temporal data to address these challenges. CDCTW enhances alignment accuracy for high dimensional time-dependent views be performing dynamic time warping on data embedded in maximally correlated subspace which handles sparsity with novel feature selection method. We validate the effectiveness of CDCTW through extensive experiments on various datasets, demonstrating superior performance over previous techniques.
