Table of Contents
Fetching ...

VIPriors 4: Visual Inductive Priors for Data-Efficient Deep Learning Challenges

Robert-Jan Bruintjes, Attila Lengyel, Marcos Baptista Rios, Osman Semih Kayhan, Davide Zambrano, Nergis Tomen, Jan van Gemert

TL;DR

VIPriors 4 investigates visual inductive priors for data-efficient deep learning under scratch-training constraints, focusing on object detection and instance segmentation. Top solutions leverage extensive data augmentation, model ensembling, and model-soup techniques, with notable prior-based contributions such as Orthogonal Uncertainty Representation and Image Uncertainty Weighted enhancing performance. The results illustrate that, despite some gains from priors, practical data-efficient performance largely hinges on engineering strategies, pointing to a path where scalable priors and efficient training become more prominent. Overall, the edition highlights both persistent effectiveness of augmentation/ensembling and the emerging potential of principled priors to improve data efficiency in vision tasks.

Abstract

The fourth edition of the "VIPriors: Visual Inductive Priors for Data-Efficient Deep Learning" workshop features two data-impaired challenges. These challenges address the problem of training deep learning models for computer vision tasks with limited data. Participants are limited to training models from scratch using a low number of training samples and are not allowed to use any form of transfer learning. We aim to stimulate the development of novel approaches that incorporate inductive biases to improve the data efficiency of deep learning models. Significant advancements are made compared to the provided baselines, where winning solutions surpass the baselines by a considerable margin in both tasks. As in previous editions, these achievements are primarily attributed to heavy use of data augmentation policies and large model ensembles, though novel prior-based methods seem to contribute more to successful solutions compared to last year. This report highlights the key aspects of the challenges and their outcomes.

VIPriors 4: Visual Inductive Priors for Data-Efficient Deep Learning Challenges

TL;DR

VIPriors 4 investigates visual inductive priors for data-efficient deep learning under scratch-training constraints, focusing on object detection and instance segmentation. Top solutions leverage extensive data augmentation, model ensembling, and model-soup techniques, with notable prior-based contributions such as Orthogonal Uncertainty Representation and Image Uncertainty Weighted enhancing performance. The results illustrate that, despite some gains from priors, practical data-efficient performance largely hinges on engineering strategies, pointing to a path where scalable priors and efficient training become more prominent. Overall, the edition highlights both persistent effectiveness of augmentation/ensembling and the emerging potential of principled priors to improve data efficiency in vision tasks.

Abstract

The fourth edition of the "VIPriors: Visual Inductive Priors for Data-Efficient Deep Learning" workshop features two data-impaired challenges. These challenges address the problem of training deep learning models for computer vision tasks with limited data. Participants are limited to training models from scratch using a low number of training samples and are not allowed to use any form of transfer learning. We aim to stimulate the development of novel approaches that incorporate inductive biases to improve the data efficiency of deep learning models. Significant advancements are made compared to the provided baselines, where winning solutions surpass the baselines by a considerable margin in both tasks. As in previous editions, these achievements are primarily attributed to heavy use of data augmentation policies and large model ensembles, though novel prior-based methods seem to contribute more to successful solutions compared to last year. This report highlights the key aspects of the challenges and their outcomes.
Paper Structure (12 sections, 5 figures, 3 tables)

This paper contains 12 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Some images from the DelftBikes dataset. Each image has a single bike with 22 labeled parts.
  • Figure 2: The first place solution of Zhao et al. recombines binary pairs of images horizontally and vertically to create a synthetic pre-training dataset. Figure adapted from technical report by Zhao et al. provided to competition organizers.
  • Figure 3: Overview of the training pipeline of the second place solution and jury prize winner of Lu et al.
  • Figure 4: Example data augmentation output of Lu et al. Figure adapted from technical report by Lu et al. provided to competition organizers.
  • Figure 5: Basketball Court Detection method by Hsu et al. The top-left figure is the original image. The top-right one is cropped, with red lines detected by the Canny edge detector and Hough transform. The blue line shows a boundary based on image size, while the green lines indicate dynamic boundary from the detected lines. The bottom-left figure displays a region identified based on the maximum convex hull, which is determined using the endpoints of all lines detected by the Canny-Hough operator. The subclass attributes of the object are determined by its bounding box coordinates. In the bottom-right image, the object marked by a dotted line represents the result of location-based copy-paste augmentation. Figure and caption adapted from technical report by Hsu et al. provided to competition organizers.