Table of Contents
Fetching ...

Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning

Xiaomeng Fan, Yuchuan Mao, Zhi Gao, Yuwei Wu, Jin Chen, Yunde Jia

TL;DR

This work theoretically demonstrate that the distribution can be effectively estimated by generating unseen-class data, through which the estimation error is upper-bounded, and proposes a novel open-vocabulary learning method, which generates unseen-class data for estimating the distribution in open environments.

Abstract

Open-vocabulary learning requires modeling the data distribution in open environments, which consists of both seen-class and unseen-class data. Existing methods estimate the distribution in open environments using seen-class data, where the absence of unseen classes makes the estimation error inherently unidentifiable. Intuitively, learning beyond the seen classes is crucial for distribution estimation to bound the estimation error. We theoretically demonstrate that the distribution can be effectively estimated by generating unseen-class data, through which the estimation error is upper-bounded. Building on this theoretical insight, we propose a novel open-vocabulary learning method, which generates unseen-class data for estimating the distribution in open environments. The method consists of a class-domain-wise data generation pipeline and a distribution alignment algorithm. The data generation pipeline generates unseen-class data under the guidance of a hierarchical semantic tree and domain information inferred from the seen-class data, facilitating accurate distribution estimation. With the generated data, the distribution alignment algorithm estimates and maximizes the posterior probability to enhance generalization in open-vocabulary learning. Extensive experiments on $11$ datasets demonstrate that our method outperforms baseline approaches by up to $14\%$, highlighting its effectiveness and superiority.

Beyond the Seen: Bounded Distribution Estimation for Open-Vocabulary Learning

TL;DR

This work theoretically demonstrate that the distribution can be effectively estimated by generating unseen-class data, through which the estimation error is upper-bounded, and proposes a novel open-vocabulary learning method, which generates unseen-class data for estimating the distribution in open environments.

Abstract

Open-vocabulary learning requires modeling the data distribution in open environments, which consists of both seen-class and unseen-class data. Existing methods estimate the distribution in open environments using seen-class data, where the absence of unseen classes makes the estimation error inherently unidentifiable. Intuitively, learning beyond the seen classes is crucial for distribution estimation to bound the estimation error. We theoretically demonstrate that the distribution can be effectively estimated by generating unseen-class data, through which the estimation error is upper-bounded. Building on this theoretical insight, we propose a novel open-vocabulary learning method, which generates unseen-class data for estimating the distribution in open environments. The method consists of a class-domain-wise data generation pipeline and a distribution alignment algorithm. The data generation pipeline generates unseen-class data under the guidance of a hierarchical semantic tree and domain information inferred from the seen-class data, facilitating accurate distribution estimation. With the generated data, the distribution alignment algorithm estimates and maximizes the posterior probability to enhance generalization in open-vocabulary learning. Extensive experiments on datasets demonstrate that our method outperforms baseline approaches by up to , highlighting its effectiveness and superiority.

Paper Structure

This paper contains 42 sections, 12 theorems, 60 equations, 5 figures, 13 tables, 1 algorithm.

Key Result

Theorem 1

With probability at least $1 - \delta$, we have the following, where $d(\cdot, \cdot)$ denotes the distribution distance, and $m$ denotes the size of seen-class dataset.

Figures (5)

  • Figure 1: Formulation of Class-Domain-Wise Data Generation Pipeline
  • Figure 2: Ablation Studies on Quantity of Predicted Unseen Classes and Generated Images.
  • Figure 3: Distance between Classes
  • Figure 4: Comparison between the images generated with class-domain-wise data generation pipeline and three prompt templates mentioned in ablation studies. The seen class is 'motorbike' and the inferred unseen class is 'car'.
  • Figure 5: Comparison between the images generated with class-domain-wise data generation pipeline and three prompt templates mentioned in ablation studies. The seen class is 'barrel' and the inferred unseen class is 'drum'.

Theorems & Definitions (21)

  • Theorem 1
  • Theorem 2
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • ...and 11 more