PlantTrack: Task-Driven Plant Keypoint Tracking with Zero-Shot Sim2Real Transfer
Samhita Marri, Arun N. Sivakumar, Naveen K. Uppalapati, Girish Chowdhary
TL;DR
PlantTrack addresses robust tracking of plant features in cluttered, deformable environments for agricultural robotics. It uses DINOv2 to extract high-dimensional features, applies a depth-based foreground filter, and trains a multi-stage heatmap predictor to localize leaves and fruits; the heatmap peaks seed an online TAPIR tracker, with both DINOv2 and TAPIR weights frozen. With as few as 20 synthetic images for training, the method achieves zero-shot Sim2Real transfer to real plants and demonstrates online tracking of leaves and fruits. This framework showcases how foundation models can be combined with synthetic data to enable scalable, task-specific keypoint tracking for phenotyping, pruning, and harvesting tasks.
Abstract
Tracking plant features is crucial for various agricultural tasks like phenotyping, pruning, or harvesting, but the unstructured, cluttered, and deformable nature of plant environments makes it a challenging task. In this context, the recent advancements in foundational models show promise in addressing this challenge. In our work, we propose PlantTrack where we utilize DINOv2 which provides high-dimensional features, and train a keypoint heatmap predictor network to identify the locations of semantic features such as fruits and leaves which are then used as prompts for point tracking across video frames using TAPIR. We show that with as few as 20 synthetic images for training the keypoint predictor, we achieve zero-shot Sim2Real transfer, enabling effective tracking of plant features in real environments.
