Table of Contents
Fetching ...

Self-Supervised Partial Cycle-Consistency for Multi-View Matching

Fedor Taggenbrock, Gertjan Burghouts, Ronald Poppe

TL;DR

This work tackles cross-camera object matching under partial view overlap by extending cycle-consistency to a partial setting and introducing a pseudo-mask to guide training. It develops trainable cycle variations and a time-divergent scene sampling strategy to enrich self-supervised learning signals, and derives a masked partial cycle-consistency loss based on pseudo-labels. On the challenging DIVOTrack dataset, the proposed combination yields a $4.3$ percentage-point improvement in F1 over the previous self-supervised state-of-the-art, with strong robustness to reduced overlap and difficult scenes. Overall, the approach enables more robust, scalable self-supervised learning of view-invariant features for large-scale multi-camera scene understanding.

Abstract

Matching objects across partially overlapping camera views is crucial in multi-camera systems and requires a view-invariant feature extraction network. Training such a network with cycle-consistency circumvents the need for labor-intensive labeling. In this paper, we extend the mathematical formulation of cycle-consistency to handle partial overlap. We then introduce a pseudo-mask which directs the training loss to take partial overlap into account. We additionally present several new cycle variants that complement each other and present a time-divergent scene sampling scheme that improves the data input for this self-supervised setting. Cross-camera matching experiments on the challenging DIVOTrack dataset show the merits of our approach. Compared to the self-supervised state-of-the-art, we achieve a 4.3 percentage point higher F1 score with our combined contributions. Our improvements are robust to reduced overlap in the training data, with substantial improvements in challenging scenes that need to make few matches between many people. Self-supervised feature networks trained with our method are effective at matching objects in a range of multi-camera settings, providing opportunities for complex tasks like large-scale multi-camera scene understanding.

Self-Supervised Partial Cycle-Consistency for Multi-View Matching

TL;DR

This work tackles cross-camera object matching under partial view overlap by extending cycle-consistency to a partial setting and introducing a pseudo-mask to guide training. It develops trainable cycle variations and a time-divergent scene sampling strategy to enrich self-supervised learning signals, and derives a masked partial cycle-consistency loss based on pseudo-labels. On the challenging DIVOTrack dataset, the proposed combination yields a percentage-point improvement in F1 over the previous self-supervised state-of-the-art, with strong robustness to reduced overlap and difficult scenes. Overall, the approach enables more robust, scalable self-supervised learning of view-invariant features for large-scale multi-camera scene understanding.

Abstract

Matching objects across partially overlapping camera views is crucial in multi-camera systems and requires a view-invariant feature extraction network. Training such a network with cycle-consistency circumvents the need for labor-intensive labeling. In this paper, we extend the mathematical formulation of cycle-consistency to handle partial overlap. We then introduce a pseudo-mask which directs the training loss to take partial overlap into account. We additionally present several new cycle variants that complement each other and present a time-divergent scene sampling scheme that improves the data input for this self-supervised setting. Cross-camera matching experiments on the challenging DIVOTrack dataset show the merits of our approach. Compared to the self-supervised state-of-the-art, we achieve a 4.3 percentage point higher F1 score with our combined contributions. Our improvements are robust to reduced overlap in the training data, with substantial improvements in challenging scenes that need to make few matches between many people. Self-supervised feature networks trained with our method are effective at matching objects in a range of multi-camera settings, providing opportunities for complex tasks like large-scale multi-camera scene understanding.
Paper Structure (18 sections, 1 theorem, 17 equations, 4 figures, 5 tables)

This paper contains 18 sections, 1 theorem, 17 equations, 4 figures, 5 tables.

Key Result

Proposition 1

If a multi-view matching $\{ P_{ij} \}_{\forall i,j}$ is partially cycle-consistent, it holds that: where $I_{iji} \subseteq I_{n_i \times n_i}$ is the identity map from view $i$ back to itself, filtering out matches that are not seen in view $V_j$: and where $I_{ijki} \subseteq I_{n_i \times n_i}$ is the identity mapping from view $i$ back to itself, filtering out all matches that are not seen

Figures (4)

  • Figure 1: Overview of our self-supervised cycle-consistency training method. Trainable cycle variations (left bottom) are constructed from sampled batches (left top). Cycle matrices represent chains of matches starting and ending in the same view. With partial overlap, however, we construct a pseudo-mask of the identity matrix (top right) to determine which specific cycles should be trained due to partial overlap. This pseudo-mask is then used to provide a weighted loss signal with more emphasis on the positive predicted cycles (right bottom).
  • Figure 2: Partial cycle-consistency and an interpretation of Equation \ref{['eq: def of I ijki summ']}. $I_{ijki}[a,a]=1$ because $a$ is matched to $b$, matched to $c$ which is then matched back to $a$. The same does not hold for $a'$, so this cycle is absent.
  • Figure 3: Qualitative example during training. Each of the blue swirls, representing Equations \ref{['eq: cyc1']}-\ref{['eq: cyc3']}, constructs a cycle matrix with various cycle-inconsistencies. Partial overlap requires that only some of the diagonal elements are trained as cycles. The pseudo-mask correctly finds the existing cycles, except for a heavily occluded one. A strong learning signal is obtained from one of the diagonals of the dark blue cycle.
  • Figure 4: Qualitative example during matching inference for a difficult frame in the test set. Our model is able to match with significantly fewer false positives. The matches found with our method are based on subtle clothing details, and have been correctly found in the presence of significant view angle differences and occlusion, significantly improving over the previous SOTA

Theorems & Definitions (1)

  • Proposition 1: Explicit partial cycle-consistency