Towards Integrating Personal Knowledge into Test-Time Predictions
Isaac Lage, Sonali Parbhoo, Finale Doshi-Velez
TL;DR
This work addresses the mismatch between ML predictions and the personal knowledge individuals possess about themselves. It introduces human feature integration, formalizing the problem as selecting a small per-instance set of test-time human features, within a budget $B$, to improve predictions via a marginalized predictor $f^{marg}$. A concrete approach assumes access to $X^h$ during training and uses an entropy-based greedy query strategy to choose features per instance, with a independence-based model for $p(X^h_d|X^m)$. Preliminary experiments on recipe and birds datasets show that a few well-chosen personal features can meaningfully boost performance beyond machine-only baselines, and that instance-level querying outperforms global feature selection in several cases. The results illustrate the feasibility and value of enabling users to contribute personal knowledge to ML predictions without requiring domain expertise about the prediction task, potentially enhancing personalization and trust in ML systems.
Abstract
Machine learning (ML) models can make decisions based on large amounts of data, but they can be missing personal knowledge available to human users about whom predictions are made. For example, a model trained to predict psychiatric outcomes may know nothing about a patient's social support system, and social support may look different for different patients. In this work, we introduce the problem of human feature integration, which provides a way to incorporate important personal-knowledge from users without domain expertise into ML predictions. We characterize this problem through illustrative user stories and comparisons to existing approaches; we formally describe this problem in a way that paves the ground for future technical solutions; and we provide a proof-of-concept study of a simple version of a solution to this problem in a semi-realistic setting.
