Match me if you can: Semi-Supervised Semantic Correspondence Learning with Unpaired Images

Jiwon Kim; Byeongho Heo; Sangdoo Yun; Seungryong Kim; Dongyoon Han

Match me if you can: Semi-Supervised Semantic Correspondence Learning with Unpaired Images

Jiwon Kim, Byeongho Heo, Sangdoo Yun, Seungryong Kim, Dongyoon Han

TL;DR

A simple machine annotator reliably enriches paired key points via machine supervision, requiring neither extra labeled key points nor trainable modules from unlabeled images, which surpass current state-of-the-art models on semantic correspondence learning benchmarks and enjoy further robustness on corruption benchmarks.

Abstract

Semantic correspondence methods have advanced to obtaining high-quality correspondences employing complicated networks, aiming to maximize the model capacity. However, despite the performance improvements, they may remain constrained by the scarcity of training keypoint pairs, a consequence of the limited training images and the sparsity of keypoints. This paper builds on the hypothesis that there is an inherent data-hungry matter in learning semantic correspondences and uncovers the models can be more trained by employing densified training pairs. We demonstrate a simple machine annotator reliably enriches paired key points via machine supervision, requiring neither extra labeled key points nor trainable modules from unlabeled images. Consequently, our models surpass current state-of-the-art models on semantic correspondence learning benchmarks like SPair-71k, PF-PASCAL, and PF-WILLOW and enjoy further robustness on corruption benchmarks. Our code is available at https://github.com/naver-ai/matchme.

Match me if you can: Semi-Supervised Semantic Correspondence Learning with Unpaired Images

TL;DR

Abstract

Paper Structure (17 sections, 8 equations, 11 figures, 9 tables)

This paper contains 17 sections, 8 equations, 11 figures, 9 tables.

Introduction
Background
Task Definition
Motivation
Method
Mining Untapped Annotation Gems
Iterative Labeling and Training
Experiments
Experimental Setups
Comparison on Benchmarks
Analyzing Our Method
Related Work
Conclusion
Robustness Evaluation Benchmark
Further Analyses
...and 2 more sections

Figures (11)

Figure 1: Untapped annotation gems. Semantic correspondence learning usually suffers from data hunger, so few sparsely paired keypoints drawn by yellow lines in labeled data inherently limit the performance. (a) Labeled images in the SPair-71k benchmark min2019spair contain sparse manually annotated keypoint pairs. (b) Unlabeled images would become hidden supplementary sources for potentially increasing the density of pairs. (c) Newly expanded image pairs can provide abundant densified points to alleviate the underlying data-hungry matter. (c) illustrates that a wealth of novel machine-annotated keypoint pairs (indicated by blue-type lines) are generated by simply incorporating new unlabeled images.
Figure 2: Schematic illustration of our method. Unlabeled images are iteratively labeled by a progressively evolving machine annotator, where incrementally increasing noise are injected to challenge the training. Therefore, by learning increasingly challenging images, the model's generalization ability continues to improve.
Figure 3: Qualitative results on SPair-71k in comparison with the competing SOTA methods. The point-to-point matches are drawn by linking key point pairs with line segments. Green and red lines denote correct and incorrect predictions with respect to the ground-truth pairs, respectively. We observe that ours outperforms the counterparts significantly across all the sample image pairs.
Figure 4: PCK at each iteration in iterative training. We report PCK values at each iteration to show the effectiveness of our training framework. We use identical architecture for the teacher and student and set the iterative training interval to 50 epochs for simplicity. The left and right figures are the results of MatchMe trained upon CATs and CATs++ backbones, respectively. This indicates that the baseline models were, in fact, undertrained and possess the capacity for further training, highlighting the data-hungry problem.
Figure A: Visualization of corrupted images in SPair-C. The corrupted images of one sample consist of types of algorithmically generated corruptions from noise, blur, weather, and digital categories. Each type of corruption has five levels of severity, resulting in 75 distinct corruptions.
...and 6 more figures

Match me if you can: Semi-Supervised Semantic Correspondence Learning with Unpaired Images

TL;DR

Abstract

Match me if you can: Semi-Supervised Semantic Correspondence Learning with Unpaired Images

Authors

TL;DR

Abstract

Table of Contents

Figures (11)