Table of Contents
Fetching ...

UniFS: Universal Few-shot Instance Perception with Point Representations

Sheng Jin, Ruijie Yao, Lumin Xu, Wentao Liu, Chen Qian, Ji Wu, Ping Luo

TL;DR

UniFS presents a universal, few-shot framework for instance perception by reformulating diverse tasks as dynamic point representation learning. It unifies task outputs via a shared architecture consisting of a feature extractor, a transformer-based point decoder, and a point head, augmented by Structure-Aware Point Learning (SAPL) to exploit higher-order relationships among points. The authors introduce the COCO-UniFS benchmark to evaluate multi-task few-shot instance perception and demonstrate that UniFS achieves competitive results with task-specific models, excelling in low-shot and unseen-task scenarios. The work advances practical multi-task few-shot perception with minimal task-specific customization and offers a foundation for broader generalist vision models.

Abstract

Instance perception tasks (object detection, instance segmentation, pose estimation, counting) play a key role in industrial applications of visual models. As supervised learning methods suffer from high labeling cost, few-shot learning methods which effectively learn from a limited number of labeled examples are desired. Existing few-shot learning methods primarily focus on a restricted set of tasks, presumably due to the challenges involved in designing a generic model capable of representing diverse tasks in a unified manner. In this paper, we propose UniFS, a universal few-shot instance perception model that unifies a wide range of instance perception tasks by reformulating them into a dynamic point representation learning framework. Additionally, we propose Structure-Aware Point Learning (SAPL) to exploit the higher-order structural relationship among points to further enhance representation learning. Our approach makes minimal assumptions about the tasks, yet it achieves competitive results compared to highly specialized and well optimized specialist models. Codes and data are available at https://github.com/jin-s13/UniFS.

UniFS: Universal Few-shot Instance Perception with Point Representations

TL;DR

UniFS presents a universal, few-shot framework for instance perception by reformulating diverse tasks as dynamic point representation learning. It unifies task outputs via a shared architecture consisting of a feature extractor, a transformer-based point decoder, and a point head, augmented by Structure-Aware Point Learning (SAPL) to exploit higher-order relationships among points. The authors introduce the COCO-UniFS benchmark to evaluate multi-task few-shot instance perception and demonstrate that UniFS achieves competitive results with task-specific models, excelling in low-shot and unseen-task scenarios. The work advances practical multi-task few-shot perception with minimal task-specific customization and offers a foundation for broader generalist vision models.

Abstract

Instance perception tasks (object detection, instance segmentation, pose estimation, counting) play a key role in industrial applications of visual models. As supervised learning methods suffer from high labeling cost, few-shot learning methods which effectively learn from a limited number of labeled examples are desired. Existing few-shot learning methods primarily focus on a restricted set of tasks, presumably due to the challenges involved in designing a generic model capable of representing diverse tasks in a unified manner. In this paper, we propose UniFS, a universal few-shot instance perception model that unifies a wide range of instance perception tasks by reformulating them into a dynamic point representation learning framework. Additionally, we propose Structure-Aware Point Learning (SAPL) to exploit the higher-order structural relationship among points to further enhance representation learning. Our approach makes minimal assumptions about the tasks, yet it achieves competitive results compared to highly specialized and well optimized specialist models. Codes and data are available at https://github.com/jin-s13/UniFS.
Paper Structure (29 sections, 4 equations, 6 figures, 6 tables)

This paper contains 29 sections, 4 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: UniFS utilizes a dynamic point representation learning paradigm to merge various task output spaces into a set of multiple points. With the provision of few-shot support images and annotated points from different tasks, UniFS can seamlessly produce corresponding point outputs on the query set.
  • Figure 2: Overview of UniFS. UniFS adopts a dynamic point representation learning paradigm to unify different task output spaces into a set of multiple points.
  • Figure 3: Structure-Aware Point Learning (SAPL): (a) Traditional L1/L2 loss focuses on individual point error. (b) SAPL integrates structural relationships among points by supervising the angle between each point and its neighboring points.
  • Figure 4: Visualization of tasks in a 5-shot scenario with support images of new categories and point annotations, leading to corresponding point outputs on the query set.
  • Figure S1: Analysis on 1-hop SAPL. For 1-hop SAPL, the trajectory of point P, where the angle between a moving point P and two fixed points is a fixed value $\theta$, is a closed curve composed of two symmetrical arcs: the spindle shape ($\theta<90^\circ$), a circle ($\theta=90^\circ$), and the lens shape ($\theta>90^\circ$).
  • ...and 1 more figures