Table of Contents
Fetching ...

Rethinking Multiple Instance Learning: Developing an Instance-Level Classifier via Weakly-Supervised Self-Training

Yingfan Ma, Xiaoyuan Luo, Mingzhi Yuan, Xinrong Chen, Manning Wang

TL;DR

This work addresses the information loss in traditional MIL that arises from bag-level supervision by reframing MIL as a semi-supervised instance classification problem. It introduces MIL-SSL, a weakly-supervised self-training framework that uses global and local constraints derived from positive bag labels to generate informative pseudo labels for unlabeled instances and train an instance-level classifier, with iterative refinement guided by optimal transport and the Sinkhorn-Knopp algorithm. The approach achieves state-of-the-art results across MNIST-based synthetic MIL tasks, five standard MIL benchmarks, and real-world histopathology datasets (CAMELYON16, TCGA), while providing ablations and analysis of hyperparameters like the positive-instance ratio parameter $\mu$. This method enables better learning of hard positive instances, improves both instance- and bag-level predictions, and suggests a pathway to leverage unlabeled bag data in MIL applications.

Abstract

Multiple instance learning (MIL) problem is currently solved from either bag-classification or instance-classification perspective, both of which ignore important information contained in some instances and result in limited performance. For example, existing methods often face difficulty in learning hard positive instances. In this paper, we formulate MIL as a semi-supervised instance classification problem, so that all the labeled and unlabeled instances can be fully utilized to train a better classifier. The difficulty in this formulation is that all the labeled instances are negative in MIL, and traditional self-training techniques used in semi-supervised learning tend to degenerate in generating pseudo labels for the unlabeled instances in this scenario. To resolve this problem, we propose a weakly-supervised self-training method, in which we utilize the positive bag labels to construct a global constraint and a local constraint on the pseudo labels to prevent them from degenerating and force the classifier to learn hard positive instances. It is worth noting that easy positive instances are instances are far from the decision boundary in the classification process, while hard positive instances are those close to the decision boundary. Through iterative optimization, the pseudo labels can gradually approach the true labels. Extensive experiments on two MNIST synthetic datasets, five traditional MIL benchmark datasets and two histopathology whole slide image datasets show that our method achieved new SOTA performance on all of them. The code will be publicly available.

Rethinking Multiple Instance Learning: Developing an Instance-Level Classifier via Weakly-Supervised Self-Training

TL;DR

This work addresses the information loss in traditional MIL that arises from bag-level supervision by reframing MIL as a semi-supervised instance classification problem. It introduces MIL-SSL, a weakly-supervised self-training framework that uses global and local constraints derived from positive bag labels to generate informative pseudo labels for unlabeled instances and train an instance-level classifier, with iterative refinement guided by optimal transport and the Sinkhorn-Knopp algorithm. The approach achieves state-of-the-art results across MNIST-based synthetic MIL tasks, five standard MIL benchmarks, and real-world histopathology datasets (CAMELYON16, TCGA), while providing ablations and analysis of hyperparameters like the positive-instance ratio parameter . This method enables better learning of hard positive instances, improves both instance- and bag-level predictions, and suggests a pathway to leverage unlabeled bag data in MIL applications.

Abstract

Multiple instance learning (MIL) problem is currently solved from either bag-classification or instance-classification perspective, both of which ignore important information contained in some instances and result in limited performance. For example, existing methods often face difficulty in learning hard positive instances. In this paper, we formulate MIL as a semi-supervised instance classification problem, so that all the labeled and unlabeled instances can be fully utilized to train a better classifier. The difficulty in this formulation is that all the labeled instances are negative in MIL, and traditional self-training techniques used in semi-supervised learning tend to degenerate in generating pseudo labels for the unlabeled instances in this scenario. To resolve this problem, we propose a weakly-supervised self-training method, in which we utilize the positive bag labels to construct a global constraint and a local constraint on the pseudo labels to prevent them from degenerating and force the classifier to learn hard positive instances. It is worth noting that easy positive instances are instances are far from the decision boundary in the classification process, while hard positive instances are those close to the decision boundary. Through iterative optimization, the pseudo labels can gradually approach the true labels. Extensive experiments on two MNIST synthetic datasets, five traditional MIL benchmark datasets and two histopathology whole slide image datasets show that our method achieved new SOTA performance on all of them. The code will be publicly available.
Paper Structure (24 sections, 24 equations, 10 figures, 9 tables)

This paper contains 24 sections, 24 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: Illustration of the problem of bag-classification approach in MIL. In the 2D instance feature space, the easy positive instances are far away from the negative instances, whereas the hard positive instances are near the negative instances. For bag-classification methods, the minimum bag classification error can be achieved when the classifier learns to distinguish a small number of easy positive instances, and all the instance information is not fully utilized. The goal of further optimization is learning hard positive instances.
  • Figure 2: The framework of the proposed MIL-SSL method, in which we formulate multiple instance learning (MIL) as a semi-supervised learning (SSL) instance classification problem. Our method iteratively alternates between two steps: (1) weakly-supervised pseudo label assignment, in which we use the current instance classifier to predict all unlabeled instances from positive bags to obtain $P$ and then calculate the pseudo labels $Q$ by solving an optimization problem under global and local constraints; (2) instance classifier training, in which we train the instance classifier with both the instances from negative bags and their true labels and the instances from positive bags and their pseudo labels.
  • Figure 3: Illustration of the iterative updating of the pseudo labels and the instance-level classifier in the proposed method. (a) Initially, the instance labels in all negative bags are known to be negative but the instance labels in positive bags are unknown, and the instance-level classifier is randomly initialized; (b) pseudo labels for all instances in positive bags are generated based on the prediction results of instance-level classifier, and the pseudo labels are constrained to obey a certain distribution; (c) the pseudo labels and true negative labels in the negative bag are used to further train the instance-level classifier, and (d) then new pseudo labels are generated with the new classifier; (e) the pseudo labels gradually approach real labels and the instance-level classifier gradually approaches the optimal classifier.
  • Figure 4: Examples of synthetic bag, where positive instances are mark by yellow boxes: (a) Positive bag in the Synthetic MNIST Normal Bag with a positive instance ratio of 10%; (b) Negative bag in the Synthetic MNIST Normal Bag; (c) Positive bag in training set of Synthetic MNIST Hard Bag, both number “0” and “8” are positive; (d) Positive bag in test set test-pos8 of Synthetic MNIST Hard Bag, only number “8” appears in the test-pos8; (e) Positive bag in test set test-pos0 of Synthetic MNIST Hard Bag, only number “0” appears in the test-pos0; (f) Negative bag in Synthetic MNIST Hard Bag.
  • Figure 5: Visualization of two examples in CAMELYON16 dataset. The first and the third images are original positive slides with different sizes of positive areas, which are outlined by blue lines. The second and the forth images are the patches cropped from them, where the background patches have been dropped out and positive patches are marked by blue boxes.
  • ...and 5 more figures