Table of Contents
Fetching ...

From Ambiguity to Action: A POMDP Perspective on Partial Multi-Label Ambiguity and Its Horizon-One Resolution

Hanlin Pan, Yuhao Tang, Wanfu Gao

TL;DR

The paper tackles partial multi-label learning where the true label set is unobserved and disambiguation errors impact downstream tasks. It formalizes disambiguation as a horizon-1 POMDP and couples it with budgeted sequential feature selection in a two-stage RL framework: Stage 1 learns a transformer-based policy to produce hard pseudo-labels, aligning with minimizing disambiguation risk, and Stage 2 uses those labels to build an interpretable feature ranking under a fixed budget. Theoretical contributions include an equivalence between PML disambiguation and horizon-1 POMDP optimization, convergence of Stage-1 policy gradients, and an excess-risk decomposition separating pseudo-label quality from sample-size/complexity effects, supported by experiments across nine diverse datasets showing clear performance gains over strong baselines. Together, the approach provides a principled, end-to-end decision framework that improves robustness to label ambiguity and yields actionable feature subsets for downstream tasks.

Abstract

In partial multi-label learning (PML), the true labels are unobserved, which makes label disambiguation important but difficult. A key challenge is that ambiguous candidate labels can propagate errors into downstream tasks such as feature engineering. To solve this issue, we jointly model the disambiguation and feature selection tasks as Partially Observable Markov Decision Processes (POMDP) to turn PML risk minimization into expected-return maximization. Stage 1 trains a transformer policy via reinforcement learning to produce high-quality hard pseudo-labels; Stage 2 describes feature selection as a sequential reinforcement learning problem, selecting features step by step and outputting an interpretable global ranking. We further provide the theoretical analysis of PML-POMDP correspondence and the excess-risk bound that decompose the error into pseudo label quality term and sample size. Experiments in multiple metrics and data sets verify the advantages of the framework.

From Ambiguity to Action: A POMDP Perspective on Partial Multi-Label Ambiguity and Its Horizon-One Resolution

TL;DR

The paper tackles partial multi-label learning where the true label set is unobserved and disambiguation errors impact downstream tasks. It formalizes disambiguation as a horizon-1 POMDP and couples it with budgeted sequential feature selection in a two-stage RL framework: Stage 1 learns a transformer-based policy to produce hard pseudo-labels, aligning with minimizing disambiguation risk, and Stage 2 uses those labels to build an interpretable feature ranking under a fixed budget. Theoretical contributions include an equivalence between PML disambiguation and horizon-1 POMDP optimization, convergence of Stage-1 policy gradients, and an excess-risk decomposition separating pseudo-label quality from sample-size/complexity effects, supported by experiments across nine diverse datasets showing clear performance gains over strong baselines. Together, the approach provides a principled, end-to-end decision framework that improves robustness to label ambiguity and yields actionable feature subsets for downstream tasks.

Abstract

In partial multi-label learning (PML), the true labels are unobserved, which makes label disambiguation important but difficult. A key challenge is that ambiguous candidate labels can propagate errors into downstream tasks such as feature engineering. To solve this issue, we jointly model the disambiguation and feature selection tasks as Partially Observable Markov Decision Processes (POMDP) to turn PML risk minimization into expected-return maximization. Stage 1 trains a transformer policy via reinforcement learning to produce high-quality hard pseudo-labels; Stage 2 describes feature selection as a sequential reinforcement learning problem, selecting features step by step and outputting an interpretable global ranking. We further provide the theoretical analysis of PML-POMDP correspondence and the excess-risk bound that decompose the error into pseudo label quality term and sample size. Experiments in multiple metrics and data sets verify the advantages of the framework.
Paper Structure (39 sections, 6 theorems, 100 equations, 5 figures, 6 tables, 1 algorithm)

This paper contains 39 sections, 6 theorems, 100 equations, 5 figures, 6 tables, 1 algorithm.

Key Result

Theorem 5.2

Assume that $\ell_{\mathrm{disc}}(X, C)$ is measurable and integrable with respect to the joint distribution of $(X,C)$ by any $\pi\in\Pi$: (a) For every stochastic policy $\pi\in\Pi$ , the mapping below defines a disambiguation rule of which disambiguation risk is consistent with the negative POMDP return: (b) Correspondingly, for each disambiguation rule $g$, there is a policy $\pi\in\Pi$ such

Figures (5)

  • Figure 1: An example of PML, only three of five candidate labels are valid (in red).
  • Figure 2: Overview of our POMDP framework for PML. Stage 1 transforms the label disambiguation problem under candidate constraints into a horizon-1 decision problem and generates hard pseudo labels. In Stage 2, these pseudo labels are used to learn the budget feature selection strategy, generating a feature subset for downstream task.
  • Figure 3: Nine methods on Birds in terms of Micro-F1, Hamming Loss, Ranking Loss and Coverage Error.
  • Figure 4: Nine methods on HumanPseAAC in terms of Micro-F1, Hamming Loss, Ranking Loss and Coverage Error.
  • Figure 5: Nine methods on Mediamill in terms of Micro-F1, Hamming Loss, Ranking Loss and Coverage Error.

Theorems & Definitions (12)

  • Definition 5.1
  • Theorem 5.2: Equivalence between PML risk and horizon-1 POMDP return
  • Theorem 5.3: Convergence to a first-order stationary point
  • Theorem 5.5: Excess risk bound for pseudo-label training
  • Lemma 3.1: Policy-gradient identity
  • proof
  • Lemma 3.2: Bounded second moment
  • proof
  • proof : Proof of Theorem 5.3
  • Lemma 4.2: Label-noise deviation bound
  • ...and 2 more