Table of Contents
Fetching ...

FastBUS: A Fast Bayesian Framework for Unified Weakly-Supervised Learning

Ziquan Wang, Haobo Wang, Ke Chen, Lei Feng, Gang Chen

TL;DR

This work derived a latent probability calculation algorithm based on generalized belief propagation and proposed two joint acceleration strategies, introducing a low-rank assumption to approximate the transition matrix, reducing time complexity and designing an end-to-end state evolution module to learn batch-scale transition matrices, facilitating multi-category batch processing.

Abstract

Machine Learning often involves various imprecise labels, leading to diverse weakly supervised settings. While recent methods aim for universal handling, they usually suffer from complex manual pre-work, ignore the relationships between associated labels, or are unable to batch process due to computational design flaws, resulting in long running times. To address these limitations, we propose a novel general framework that efficiently infers latent true label distributions across various weak supervisions. Our key idea is to express the label brute-force search process as a probabilistic transition of label variables, compressing diverse weakly supervised DFS tree structures into a shared Bayesian network. From this, we derived a latent probability calculation algorithm based on generalized belief propagation and proposed two joint acceleration strategies: 1) introducing a low-rank assumption to approximate the transition matrix, reducing time complexity; 2) designing an end-to-end state evolution module to learn batch-scale transition matrices, facilitating multi-category batch processing. In addition, the equivalence of our method with the EM algorithm in most scenarios is further demonstrated. Extensive experiments show that our method achieves SOTA results under most weakly supervised settings, and achieves up to hundreds of times faster acceleration in running time compared to other general methods.

FastBUS: A Fast Bayesian Framework for Unified Weakly-Supervised Learning

TL;DR

This work derived a latent probability calculation algorithm based on generalized belief propagation and proposed two joint acceleration strategies, introducing a low-rank assumption to approximate the transition matrix, reducing time complexity and designing an end-to-end state evolution module to learn batch-scale transition matrices, facilitating multi-category batch processing.

Abstract

Machine Learning often involves various imprecise labels, leading to diverse weakly supervised settings. While recent methods aim for universal handling, they usually suffer from complex manual pre-work, ignore the relationships between associated labels, or are unable to batch process due to computational design flaws, resulting in long running times. To address these limitations, we propose a novel general framework that efficiently infers latent true label distributions across various weak supervisions. Our key idea is to express the label brute-force search process as a probabilistic transition of label variables, compressing diverse weakly supervised DFS tree structures into a shared Bayesian network. From this, we derived a latent probability calculation algorithm based on generalized belief propagation and proposed two joint acceleration strategies: 1) introducing a low-rank assumption to approximate the transition matrix, reducing time complexity; 2) designing an end-to-end state evolution module to learn batch-scale transition matrices, facilitating multi-category batch processing. In addition, the equivalence of our method with the EM algorithm in most scenarios is further demonstrated. Extensive experiments show that our method achieves SOTA results under most weakly supervised settings, and achieves up to hundreds of times faster acceleration in running time compared to other general methods.
Paper Structure (15 sections, 1 theorem, 4 equations, 3 figures, 2 tables)

This paper contains 15 sections, 1 theorem, 4 equations, 3 figures, 2 tables.

Key Result

Theorem 1

The risk $\tilde{R}(f)=\mathbb{E}_{(\bm{x}^{[K]}, \bm{w})\sim P(\bm{X},\bm{W})}(\tilde{\mathcal{L}}(f_b(\bm{x}^{[K]}), \bm{w});f)$ is unbiased with respect to $R(f)$, where $\bm{e}_j$ is a unit vector whose $j$-th element is 1, and $f_g$ is the bag-level classifier corresponding to $f$.

Figures (3)

  • Figure 1: Comparison of run time and accuracy with recent SOTA methods on general weakly supervised learning as the batch size is 4 and the average number of instances in each bag is 20. The runtime advantage becomes more pronounced as the batch size, number of classes, or instances increases.
  • Figure 2: DFS tree for Multi-Instance Learning when the number of instances in one bag is 3 for only one class. Note that nodes of the same color belong to the same layer.
  • Figure 3: A loopy Bayesian network model for general weak supervision. Different colors denote distinct label types. (a) Left panel: Visualization of a probabilistic graph with 3 instances and 3 labels, where all feasible instance configurations are mined through chain-structured dependencies, while loops explicitly characterize multi-label correlations. Right panel: Factor graph representation corresponding to the chain (b) and loop (c) components of the left network, employing generalized belief propagation (GBP) for efficient probabilistic inference.

Theorems & Definitions (2)

  • Theorem 1: URE for weak supervisionGWSL_ICML_2023
  • Remark 1