Table of Contents
Fetching ...

Matching Semantically Similar Non-Identical Objects

Yusuke Marumo, Kazuhiko Kawamoto, Satomi Tanaka, Shigenobu Hirano, Hiroshi Kera

TL;DR

This work tackles pixel-level matching between semantically similar but non-identical objects, addressing challenges from class discrepancy and domain shift. It proposes a plug-and-play Semantic Enhancement Weighting (SEW) module that uses object detector heatmaps and Grad-CAM to reweight sparse descriptors, and a Non-visual Object Pairing mechanism to select appropriate object pairs when multiple objects are present. The approach extends existing sparse matchers without training and is evaluated with a new annotation-free metric, Triangular Matching Consistency (TMC), as well as relative pose accuracy under corruptions; results show notable improvements over strong baselines. The work demonstrates robustness across in-class variations, domain shifts, and cross-domain drawings, offering practical impact for fine-grained, cross-object correspondence tasks and potential downstream applications such as landmark transfer and assembly guidance, while maintaining real-time feasibility through plug-and-play integration.

Abstract

Not identical but similar objects are ubiquitous in our world, ranging from four-legged animals such as dogs and cats to cars of different models and flowers of various colors. This study addresses a novel task of matching such non-identical objects at the pixel level. We propose a weighting scheme of descriptors, Semantic Enhancement Weighting (SEW), that incorporates semantic information from object detectors into existing sparse feature matching methods, extending their targets from identical objects captured from different perspectives to semantically similar objects. The experiments show successful matching between non-identical objects in various cases, including in-class design variations, class discrepancy, and domain shifts (e.g., photo vs. drawing and image corruptions). The code is available at https://github.com/Circ-Leaf/NIOM .

Matching Semantically Similar Non-Identical Objects

TL;DR

This work tackles pixel-level matching between semantically similar but non-identical objects, addressing challenges from class discrepancy and domain shift. It proposes a plug-and-play Semantic Enhancement Weighting (SEW) module that uses object detector heatmaps and Grad-CAM to reweight sparse descriptors, and a Non-visual Object Pairing mechanism to select appropriate object pairs when multiple objects are present. The approach extends existing sparse matchers without training and is evaluated with a new annotation-free metric, Triangular Matching Consistency (TMC), as well as relative pose accuracy under corruptions; results show notable improvements over strong baselines. The work demonstrates robustness across in-class variations, domain shifts, and cross-domain drawings, offering practical impact for fine-grained, cross-object correspondence tasks and potential downstream applications such as landmark transfer and assembly guidance, while maintaining real-time feasibility through plug-and-play integration.

Abstract

Not identical but similar objects are ubiquitous in our world, ranging from four-legged animals such as dogs and cats to cars of different models and flowers of various colors. This study addresses a novel task of matching such non-identical objects at the pixel level. We propose a weighting scheme of descriptors, Semantic Enhancement Weighting (SEW), that incorporates semantic information from object detectors into existing sparse feature matching methods, extending their targets from identical objects captured from different perspectives to semantically similar objects. The experiments show successful matching between non-identical objects in various cases, including in-class design variations, class discrepancy, and domain shifts (e.g., photo vs. drawing and image corruptions). The code is available at https://github.com/Circ-Leaf/NIOM .
Paper Structure (37 sections, 15 equations, 16 figures, 7 tables)

This paper contains 37 sections, 15 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: Non-identical object matching between various objects and image styles is achieved by our plug-and-play method with SuperPoint SuperPoint keypoint detector and LightGlue LightGlue matcher.
  • Figure 2: State-of-the-art models of sparse matching (LightGlue LightGlue) and semantic correspondence (GeoAware-SC GeoAwareSC). The latter requires reference points in one image and finding their correspondence in the target image. To this end, 50 reference points are sampled randomly or based on the GradCAM GradCAM heatmap of YOLOv7 YOLOv7 (cf. Sec. \ref{['suppsec:GeoAwareMatchingAlgorithm']} in the supplementary material). Red boxes show mismatches. Orange boxes show close matches, which are correct at the part level, but not at the pixel level. This was typical for Geo-Aware-SC (e.g., matching between the nose tip of the cheetah and the nostrils of the husky in (d)). Further, Geo-Aware-SC was about ten times slower than LightGlue. (See Tab. \ref{['tab:geoaware_ours']} in the supplementary material).
  • Figure 3: Non-identical object warping. Homography is estimated from fine-grained matching of city clocks with different styles.
  • Figure 4: Pipeline of the matching. The keypoint detection and feature matching are done by off-the-shelf models. The proposed plug-and-play module, SEW, computes the heatmap scores of objects and weights the descriptors with this semantic information.
  • Figure 5: Non-visual Object Pairing converts class labels from an object detector into embeddings using the CLIP CLIP text encoder and compares them with cosine similarity to identify semantically similar object pairs. The GradCAM GradCAM heatmap of object detector YOLOv7 YOLOv7 generated for the chosen object pair is used for our weighting method, enhancing matching with multiple objects.
  • ...and 11 more figures