Table of Contents
Fetching ...

3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation

Ruipu Wu, Jifei Che, Han Li, Chengjing Wu, Ting Liu, Luoqi Liu

TL;DR

This paper addresses video panoptic segmentation (VPS) by building on the DVIS++ baseline and introducing a query-wise ensemble to align set-prediction outputs across frames and augmentations. The core technique reuses and refines object queries through a combination of attention-based tracking and temporal refinement, with a novel ensemble rule that merges supplementary queries when they closely match existing ones. Evaluations on VIPSeg show strong performance, achieving VPQ 57.01 on the test set and securing 3rd place in the PVUW 2024 VPS track, demonstrating improved temporal coherence and reduced missed segments. Overall, the method advances VPS in the wild by integrating a decoupled, query-level fusion strategy with staged training and test-time augmentations.

Abstract

Video panoptic segmentation is an advanced task that extends panoptic segmentation by applying its concept to video sequences. In the hope of addressing the challenge of video panoptic segmentation in diverse conditions, We utilize DVIS++ as our baseline model and enhance it by introducing a comprehensive approach centered on the query-wise ensemble, supplemented by additional techniques. Our proposed approach achieved a VPQ score of 57.01 on the VIPSeg test set, and ranked 3rd in the VPS track of the 3rd Pixel-level Video Understanding in the Wild Challenge.

3rd Place Solution for PVUW Challenge 2024: Video Panoptic Segmentation

TL;DR

This paper addresses video panoptic segmentation (VPS) by building on the DVIS++ baseline and introducing a query-wise ensemble to align set-prediction outputs across frames and augmentations. The core technique reuses and refines object queries through a combination of attention-based tracking and temporal refinement, with a novel ensemble rule that merges supplementary queries when they closely match existing ones. Evaluations on VIPSeg show strong performance, achieving VPQ 57.01 on the test set and securing 3rd place in the PVUW 2024 VPS track, demonstrating improved temporal coherence and reduced missed segments. Overall, the method advances VPS in the wild by integrating a decoupled, query-level fusion strategy with staged training and test-time augmentations.

Abstract

Video panoptic segmentation is an advanced task that extends panoptic segmentation by applying its concept to video sequences. In the hope of addressing the challenge of video panoptic segmentation in diverse conditions, We utilize DVIS++ as our baseline model and enhance it by introducing a comprehensive approach centered on the query-wise ensemble, supplemented by additional techniques. Our proposed approach achieved a VPQ score of 57.01 on the VIPSeg test set, and ranked 3rd in the VPS track of the 3rd Pixel-level Video Understanding in the Wild Challenge.
Paper Structure (13 sections, 2 figures, 3 tables)

This paper contains 13 sections, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Examples of the VIPSeg dataset vipseg.
  • Figure 2: Architecture of DVIS++dvis++.