Table of Contents
Fetching ...

Augment Before Copy-Paste: Data and Memory Efficiency-Oriented Instance Segmentation Framework for Sport-scenes

Chih-Chung Hsu, Chia-Ming Lee, Ming-Shyen Wu

TL;DR

Memory effIciency inStance Segmentation framework based on visual inductive prior flow propagation that effectively incorporates inherent prior information from the dataset into both the data preprocessing and data augmentation stages, as well as the inference phase is proposed.

Abstract

Instance segmentation is a fundamental task in computer vision with broad applications across various industries. In recent years, with the proliferation of deep learning and artificial intelligence applications, how to train effective models with limited data has become a pressing issue for both academia and industry. In the Visual Inductive Priors challenge (VIPriors2023), participants must train a model capable of precisely locating individuals on a basketball court, all while working with limited data and without the use of transfer learning or pre-trained models. We propose Memory effIciency inStance Segmentation framework based on visual inductive prior flow propagation that effectively incorporates inherent prior information from the dataset into both the data preprocessing and data augmentation stages, as well as the inference phase. Our team (ACVLAB) experiments demonstrate that our model achieves promising performance (0.509 AP@0.50:0.95) even under limited data and memory constraints.

Augment Before Copy-Paste: Data and Memory Efficiency-Oriented Instance Segmentation Framework for Sport-scenes

TL;DR

Memory effIciency inStance Segmentation framework based on visual inductive prior flow propagation that effectively incorporates inherent prior information from the dataset into both the data preprocessing and data augmentation stages, as well as the inference phase is proposed.

Abstract

Instance segmentation is a fundamental task in computer vision with broad applications across various industries. In recent years, with the proliferation of deep learning and artificial intelligence applications, how to train effective models with limited data has become a pressing issue for both academia and industry. In the Visual Inductive Priors challenge (VIPriors2023), participants must train a model capable of precisely locating individuals on a basketball court, all while working with limited data and without the use of transfer learning or pre-trained models. We propose Memory effIciency inStance Segmentation framework based on visual inductive prior flow propagation that effectively incorporates inherent prior information from the dataset into both the data preprocessing and data augmentation stages, as well as the inference phase. Our team (ACVLAB) experiments demonstrate that our model achieves promising performance (0.509 AP@0.50:0.95) even under limited data and memory constraints.
Paper Structure (13 sections, 5 figures, 2 tables, 1 algorithm)

This paper contains 13 sections, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: The overall of proposed instance segmentation framework. The visual inductive prior is fully utilized at each stage to make effective optimizations. This approach not only reduces computational resource consumption but also maintains solid model performance. We begin by employing the Canny-Hough operator to adaptively combine image-level prior to detect the basketball court's position. Subsequently, we leverage class-level prior for identity identification. We then utilize this information for style transformation of various objects, integrating image-level prior knowledge through copy-paste augmentation. Finally, model inference solely based on the detected basketball court's location.
  • Figure 2: The illustrations for the cropping algorithm. The left figure is the original image. The right one is cropped, with red lines detected by the Canny edge detector and Hough transform. The blue line shows a boundary based on image size, while the green lines indicate dynamic boundary from the detected lines.
  • Figure 3: The left figure displays a region identified based on the maximum convex hull, which is determined using the endpoints of all lines detected by the Canny-Hough operator. The subclass attributes of the object are determined by its bounding box coordinates. The object marked by a dotted line represents the result of location-based copy-paste augmentation.
  • Figure 4: The demo of identity-based style transfer applied to basketball players. Significant variations in appearance are evident after the hue or RGB transformation. In the left example, there is a noticeable change in skin tone, while in the right example, the player's jersey changes dramatically, almost as if he has switched to a different team.
  • Figure 5: The cropped area statistic barchart. The x-axis is corresponding to basketball courts; the y-axis is the cropped area ratio against whole raw image. From left to right, the three colors correspond to the training, validation, and testing set.