A Vector Symbolic Approach to Multiple Instance Learning
Ehsan Ahmed Dhrubo, Mohammad Mahmudul Alam, Edward Raff, Tim Oates, James Holt
TL;DR
This work tackles the MIL requirement that a bag is labeled positive if and only if at least one instance is positive by enforcing the constraint within the model design. It introduces a Vector Symbolic Architecture (VSA) framework for MIL, where instances and concepts are high-dimensional vectors and a learned encoder maps inputs into VSA space; a VSA-MIL classifier uses a bag-representation $\mathbf{s}=\sum_i \mathbf{v}_i$ and concept vectors $\mathbf{c}_k$ with $h(\mathbf{s})=\min_k \max_i \mathbf{c}_k^T \mathbf{v}_i$ to determine the bag label, ensuring MIL-consistent predictions. The methodology combines an autoencoder to produce VSA-friendly representations, k-means discretization to create a finite VSA codebook, and a max-min dot-product mechanism over concepts, trained with hyperparameter tuning. Experimental results on traditional MIL benchmarks and multiple medical-imaging MIL datasets show that VSA-MIL outperforms valid MIL baselines (e.g., CausalMIL, miSVM) and even surpasses some invalid MIL approaches on accuracy and AUROC, while providing interpretable concept exemplars. The approach offers a principled, interpretable, and effective alternative to heuristic MIL methods, with potential for broader applicability and further performance gains through backbone and overlap optimizations.
Abstract
Multiple Instance Learning (MIL) tasks impose a strict logical constraint: a bag is labeled positive if and only if at least one instance within it is positive. While this iff constraint aligns with many real-world applications, recent work has shown that most deep learning-based MIL approaches violate it, leading to inflated performance metrics and poor generalization. We propose a novel MIL framework based on Vector Symbolic Architectures (VSAs), which provide a differentiable mechanism for performing symbolic operations in high-dimensional space. Our method encodes the MIL assumption directly into the model's structure by representing instances and concepts as nearly orthogonal high-dimensional vectors and using algebraic operations to enforce the iff constraint during classification. To bridge the gap between raw data and VSA representations, we design a learned encoder that transforms input instances into VSA-compatible vectors while preserving key distributional properties. Our approach, which includes a VSA-driven MaxNetwork classifier, achieves state-of-the-art results for a valid MIL model on standard MIL benchmarks and medical imaging datasets, outperforming existing methods while maintaining strict adherence to the MIL formulation. This work offers a principled, interpretable, and effective alternative to existing MIL approaches that rely on learned heuristics.
