Table of Contents
Fetching ...

CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning

Dongmyeong Lee, Amanda Adkins, Joydeep Biswas

TL;DR

CLOVER addresses the challenge of long-term object re-identification under varying viewpoints and weather by learning a context-aware representation that does not require foreground segmentation. It leverages a foundation-model–based encoder with context-rich patches and supervised contrastive learning to produce robust object descriptors, and introduces MapCLOVER to summarize these descriptors for scalable map-based matching. The CODa Re-ID dataset provides a large, outdoor, multi-condition benchmark to drive general-object re-ID research, and experiments show CLOVER achieves strong generalization to unseen instances and classes, outperforming baselines. MapCLOVER demonstrates scalable, accurate matching with compact summaries, enabling robust long-term object data association in dynamic robotic environments.

Abstract

Mobile service robots can benefit from object-level understanding of their environments, including the ability to distinguish object instances and re-identify previously seen instances. Object re-identification is challenging across different viewpoints and in scenes with significant appearance variation arising from weather or lighting changes. Existing works on object re-identification either focus on specific classes or require foreground segmentation. Further, these methods, along with object re-identification datasets, have limited consideration of challenges such as outdoor scenes and illumination changes. To address this problem, we introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects across 8 classes under diverse lighting conditions and viewpoints. Further, we propose CLOVER, a representation learning method for object observations that can distinguish between static object instances without requiring foreground segmentation. We also introduce MapCLOVER, a method for scalably summarizing CLOVER descriptors for use in object maps and matching new observations to summarized descriptors. Our results show that CLOVER achieves superior performance in static object re-identification under varying lighting conditions and viewpoint changes and can generalize to unseen instances and classes.

CLOVER: Context-aware Long-term Object Viewpoint- and Environment- Invariant Representation Learning

TL;DR

CLOVER addresses the challenge of long-term object re-identification under varying viewpoints and weather by learning a context-aware representation that does not require foreground segmentation. It leverages a foundation-model–based encoder with context-rich patches and supervised contrastive learning to produce robust object descriptors, and introduces MapCLOVER to summarize these descriptors for scalable map-based matching. The CODa Re-ID dataset provides a large, outdoor, multi-condition benchmark to drive general-object re-ID research, and experiments show CLOVER achieves strong generalization to unseen instances and classes, outperforming baselines. MapCLOVER demonstrates scalable, accurate matching with compact summaries, enabling robust long-term object data association in dynamic robotic environments.

Abstract

Mobile service robots can benefit from object-level understanding of their environments, including the ability to distinguish object instances and re-identify previously seen instances. Object re-identification is challenging across different viewpoints and in scenes with significant appearance variation arising from weather or lighting changes. Existing works on object re-identification either focus on specific classes or require foreground segmentation. Further, these methods, along with object re-identification datasets, have limited consideration of challenges such as outdoor scenes and illumination changes. To address this problem, we introduce CODa Re-ID: an in-the-wild object re-identification dataset containing 1,037,814 observations of 557 objects across 8 classes under diverse lighting conditions and viewpoints. Further, we propose CLOVER, a representation learning method for object observations that can distinguish between static object instances without requiring foreground segmentation. We also introduce MapCLOVER, a method for scalably summarizing CLOVER descriptors for use in object maps and matching new observations to summarized descriptors. Our results show that CLOVER achieves superior performance in static object re-identification under varying lighting conditions and viewpoint changes and can generalize to unseen instances and classes.
Paper Structure (18 sections, 5 equations, 3 figures, 8 tables)

This paper contains 18 sections, 5 equations, 3 figures, 8 tables.

Figures (3)

  • Figure 1: Architecture of CLOVER. The tree in the input image patches is annotated with a bounding box (red) for visibility.
  • Figure 2: (a) Globally aligned trajectories. (b) Global 3D bounding boxes for tree (green) and pole (blue) from CODa.
  • Figure 3: Qualitative performance on two pairs of images, each from the same object. Values give similarity between images: higher cosine similarity (CLOVER, FFA) and lower $L_2$ distance (WDISI, re-OBJ) indicate higher matching confidence.