Table of Contents
Fetching ...

A Vector Symbolic Approach to Multiple Instance Learning

Ehsan Ahmed Dhrubo, Mohammad Mahmudul Alam, Edward Raff, Tim Oates, James Holt

TL;DR

This work tackles the MIL requirement that a bag is labeled positive if and only if at least one instance is positive by enforcing the constraint within the model design. It introduces a Vector Symbolic Architecture (VSA) framework for MIL, where instances and concepts are high-dimensional vectors and a learned encoder maps inputs into VSA space; a VSA-MIL classifier uses a bag-representation $\mathbf{s}=\sum_i \mathbf{v}_i$ and concept vectors $\mathbf{c}_k$ with $h(\mathbf{s})=\min_k \max_i \mathbf{c}_k^T \mathbf{v}_i$ to determine the bag label, ensuring MIL-consistent predictions. The methodology combines an autoencoder to produce VSA-friendly representations, k-means discretization to create a finite VSA codebook, and a max-min dot-product mechanism over concepts, trained with hyperparameter tuning. Experimental results on traditional MIL benchmarks and multiple medical-imaging MIL datasets show that VSA-MIL outperforms valid MIL baselines (e.g., CausalMIL, miSVM) and even surpasses some invalid MIL approaches on accuracy and AUROC, while providing interpretable concept exemplars. The approach offers a principled, interpretable, and effective alternative to heuristic MIL methods, with potential for broader applicability and further performance gains through backbone and overlap optimizations.

Abstract

Multiple Instance Learning (MIL) tasks impose a strict logical constraint: a bag is labeled positive if and only if at least one instance within it is positive. While this iff constraint aligns with many real-world applications, recent work has shown that most deep learning-based MIL approaches violate it, leading to inflated performance metrics and poor generalization. We propose a novel MIL framework based on Vector Symbolic Architectures (VSAs), which provide a differentiable mechanism for performing symbolic operations in high-dimensional space. Our method encodes the MIL assumption directly into the model's structure by representing instances and concepts as nearly orthogonal high-dimensional vectors and using algebraic operations to enforce the iff constraint during classification. To bridge the gap between raw data and VSA representations, we design a learned encoder that transforms input instances into VSA-compatible vectors while preserving key distributional properties. Our approach, which includes a VSA-driven MaxNetwork classifier, achieves state-of-the-art results for a valid MIL model on standard MIL benchmarks and medical imaging datasets, outperforming existing methods while maintaining strict adherence to the MIL formulation. This work offers a principled, interpretable, and effective alternative to existing MIL approaches that rely on learned heuristics.

A Vector Symbolic Approach to Multiple Instance Learning

TL;DR

This work tackles the MIL requirement that a bag is labeled positive if and only if at least one instance is positive by enforcing the constraint within the model design. It introduces a Vector Symbolic Architecture (VSA) framework for MIL, where instances and concepts are high-dimensional vectors and a learned encoder maps inputs into VSA space; a VSA-MIL classifier uses a bag-representation and concept vectors with to determine the bag label, ensuring MIL-consistent predictions. The methodology combines an autoencoder to produce VSA-friendly representations, k-means discretization to create a finite VSA codebook, and a max-min dot-product mechanism over concepts, trained with hyperparameter tuning. Experimental results on traditional MIL benchmarks and multiple medical-imaging MIL datasets show that VSA-MIL outperforms valid MIL baselines (e.g., CausalMIL, miSVM) and even surpasses some invalid MIL approaches on accuracy and AUROC, while providing interpretable concept exemplars. The approach offers a principled, interpretable, and effective alternative to heuristic MIL methods, with potential for broader applicability and further performance gains through backbone and overlap optimizations.

Abstract

Multiple Instance Learning (MIL) tasks impose a strict logical constraint: a bag is labeled positive if and only if at least one instance within it is positive. While this iff constraint aligns with many real-world applications, recent work has shown that most deep learning-based MIL approaches violate it, leading to inflated performance metrics and poor generalization. We propose a novel MIL framework based on Vector Symbolic Architectures (VSAs), which provide a differentiable mechanism for performing symbolic operations in high-dimensional space. Our method encodes the MIL assumption directly into the model's structure by representing instances and concepts as nearly orthogonal high-dimensional vectors and using algebraic operations to enforce the iff constraint during classification. To bridge the gap between raw data and VSA representations, we design a learned encoder that transforms input instances into VSA-compatible vectors while preserving key distributional properties. Our approach, which includes a VSA-driven MaxNetwork classifier, achieves state-of-the-art results for a valid MIL model on standard MIL benchmarks and medical imaging datasets, outperforming existing methods while maintaining strict adherence to the MIL formulation. This work offers a principled, interpretable, and effective alternative to existing MIL approaches that rely on learned heuristics.

Paper Structure

This paper contains 15 sections, 6 equations, 14 figures, 8 tables, 1 algorithm.

Figures (14)

  • Figure 1: The overarching approach to VSA-MIL is presented in this flow. The dataset $\mathcal{X}$ is not a VSA, so it first goes through an Auto-Encoder to incentivize a representation that produces outputs that maintain the necessary conditions of the VSA in a semantically meaningful way. Clustering discretizes the continuous space via a codebook so that exact matches can occur. Then a VSA informed network can be used to train on the VSA compatible representation $\mathbf{v}_i$.
  • Figure 2: UMAP projection of VSA embeddings into 2D space from the training dataset of MUSK1.
  • Figure 3: Histogram density distribution of VSA from the training dataset of MUSK1.
  • Figure 4: Extracting image patches for representing concept vectors with 25 images from training set of BTMD dataset.
  • Figure 5: Extracting image patches for representing concept vectors with 25 images from the training set of the RSNA-SMBC dataset.
  • ...and 9 more figures