Table of Contents
Fetching ...

Segment Anything in Light Fields for Real-Time Applications via Constrained Prompting

Nikolai Goncharov, Donald G. Dansereau

TL;DR

This work presents a novel segmentation method that adapts SAM 2 to the light field domain without retraining or modifying the model, and produces high quality and view-consistent masks, outperforming the SAM 2 video tracking baseline and working 7 times faster, moving towards a real-time segmentation speed.

Abstract

Segmented light field images can serve as a powerful representation in many of computer vision tasks exploiting geometry and appearance of objects, such as object pose tracking. In the light field domain, segmentation presents an additional objective of recognizing the same segment through all the views. Segment Anything Model 2 (SAM 2) allows producing semantically meaningful segments for monocular images and videos. However, using SAM 2 directly on light fields is highly ineffective due to unexploited constraints. In this work, we present a novel light field segmentation method that adapts SAM 2 to the light field domain without retraining or modifying the model. By utilizing the light field domain constraints, the method produces high quality and view-consistent light field masks, outperforming the SAM 2 video tracking baseline and working 7 times faster, with a real-time speed. We achieve this by exploiting the epipolar geometry cues to propagate the masks between the views, probing the SAM 2 latent space to estimate their occlusion, and further prompting SAM 2 for their refinement.

Segment Anything in Light Fields for Real-Time Applications via Constrained Prompting

TL;DR

This work presents a novel segmentation method that adapts SAM 2 to the light field domain without retraining or modifying the model, and produces high quality and view-consistent masks, outperforming the SAM 2 video tracking baseline and working 7 times faster, moving towards a real-time segmentation speed.

Abstract

Segmented light field images can serve as a powerful representation in many of computer vision tasks exploiting geometry and appearance of objects, such as object pose tracking. In the light field domain, segmentation presents an additional objective of recognizing the same segment through all the views. Segment Anything Model 2 (SAM 2) allows producing semantically meaningful segments for monocular images and videos. However, using SAM 2 directly on light fields is highly ineffective due to unexploited constraints. In this work, we present a novel light field segmentation method that adapts SAM 2 to the light field domain without retraining or modifying the model. By utilizing the light field domain constraints, the method produces high quality and view-consistent light field masks, outperforming the SAM 2 video tracking baseline and working 7 times faster, with a real-time speed. We achieve this by exploiting the epipolar geometry cues to propagate the masks between the views, probing the SAM 2 latent space to estimate their occlusion, and further prompting SAM 2 for their refinement.

Paper Structure

This paper contains 14 sections, 4 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Our method. First, we obtain middle view segmentation and a disparity map for a light field image. We perform disparity propagation on the middle view masks, resulting in coarse mask predictions. We occlude those predictions using mask feature similarities with respect to the source segment. Then, we aggregate the resulting masks into points and use them prompt SAM 2 image model in the rest of the subviews, resulting in the refined light field mask predictions.
  • Figure 2: Qualitative results for our method on two scenes from UrbanLF sheng2022urbanlf dataset. Our method is at the top row, and SAM 2 video tracking is at the bottom. Top left, middle and bottom subviews are visualized left-to-right. In each area highlighted by a rectangle, an epipolar plane image is taken across the yellow dotted line, upsampled along the subview axis and visualized.