Flow Intelligence: Robust Feature Matching via Temporal Signature Correlation

Jie Wang; Chen Ye Gan; Caoqi Wei; Jiangtao Wen; Yuxing Han

Flow Intelligence: Robust Feature Matching via Temporal Signature Correlation

Jie Wang, Chen Ye Gan, Caoqi Wei, Jiangtao Wen, Yuxing Han

TL;DR

This work addresses robust feature matching across video streams, including cross-modal settings, where appearance-based methods falter. It introduces Flow Intelligence, a temporal-signature framework that discards spatial keypoints in favor of motion-state sequences derived from per-frame block differences, using a coarse-to-fine, patch-based architecture with quadtree optimization and propagation, all without any training data ($0$ training). The approach comprises Motion State Sequence Construction, Motion Sequence Matching with a binary-sequence distance, and Optimization/Propagation to achieve dense, cross-view correspondences, demonstrated on multi-view and cross-modal datasets with competitive or superior density and speed compared with learning-based baselines; patch-level accuracy remains high even under challenging conditions. Results show strong robustness to viewpoint changes, lighting variation, and modality gaps, suggesting practical impact for real-time, multimodal video understanding without the need for large training corpora.

Abstract

Feature matching across video streams remains a cornerstone challenge in computer vision. Increasingly, robust multimodal matching has garnered interest in robotics, surveillance, remote sensing, and medical imaging. While traditional rely on detecting and matching spatial features, they break down when faced with noisy, misaligned, or cross-modal data. Recent deep learning methods have improved robustness through learned representations, but remain constrained by their dependence on extensive training data and computational demands. We present Flow Intelligence, a paradigm-shifting approach that moves beyond spatial features by focusing on temporal motion patterns exclusively. Instead of detecting traditional keypoints, our method extracts motion signatures from pixel blocks across consecutive frames and extract temporal motion signatures between videos. These motion-based descriptors achieve natural invariance to translation, rotation, and scale variations while remaining robust across different imaging modalities. This novel approach also requires no pretraining data, eliminates the need for spatial feature detection, enables cross-modal matching using only temporal motion, and it outperforms existing methods in challenging scenarios where traditional approaches fail. By leveraging motion rather than appearance, Flow Intelligence enables robust, real-time video feature matching in diverse environments.

Flow Intelligence: Robust Feature Matching via Temporal Signature Correlation

TL;DR

Abstract

Flow Intelligence: Robust Feature Matching via Temporal Signature Correlation

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)