Towards Integrating Personal Knowledge into Test-Time Predictions

Isaac Lage; Sonali Parbhoo; Finale Doshi-Velez

Towards Integrating Personal Knowledge into Test-Time Predictions

Isaac Lage, Sonali Parbhoo, Finale Doshi-Velez

TL;DR

This work addresses the mismatch between ML predictions and the personal knowledge individuals possess about themselves. It introduces human feature integration, formalizing the problem as selecting a small per-instance set of test-time human features, within a budget $B$, to improve predictions via a marginalized predictor $f^{marg}$. A concrete approach assumes access to $X^h$ during training and uses an entropy-based greedy query strategy to choose features per instance, with a independence-based model for $p(X^h_d|X^m)$. Preliminary experiments on recipe and birds datasets show that a few well-chosen personal features can meaningfully boost performance beyond machine-only baselines, and that instance-level querying outperforms global feature selection in several cases. The results illustrate the feasibility and value of enabling users to contribute personal knowledge to ML predictions without requiring domain expertise about the prediction task, potentially enhancing personalization and trust in ML systems.

Abstract

Machine learning (ML) models can make decisions based on large amounts of data, but they can be missing personal knowledge available to human users about whom predictions are made. For example, a model trained to predict psychiatric outcomes may know nothing about a patient's social support system, and social support may look different for different patients. In this work, we introduce the problem of human feature integration, which provides a way to incorporate important personal-knowledge from users without domain expertise into ML predictions. We characterize this problem through illustrative user stories and comparisons to existing approaches; we formally describe this problem in a way that paves the ground for future technical solutions; and we provide a proof-of-concept study of a simple version of a solution to this problem in a semi-realistic setting.

Towards Integrating Personal Knowledge into Test-Time Predictions

TL;DR

, to improve predictions via a marginalized predictor

. A concrete approach assumes access to

during training and uses an entropy-based greedy query strategy to choose features per instance, with a independence-based model for

. Preliminary experiments on recipe and birds datasets show that a few well-chosen personal features can meaningfully boost performance beyond machine-only baselines, and that instance-level querying outperforms global feature selection in several cases. The results illustrate the feasibility and value of enabling users to contribute personal knowledge to ML predictions without requiring domain expertise about the prediction task, potentially enhancing personalization and trust in ML systems.

Abstract

Paper Structure (53 sections, 9 equations, 5 figures, 1 algorithm)

This paper contains 53 sections, 9 equations, 5 figures, 1 algorithm.

Introduction
User Stories about Personal Knowledge
Scenario 1: Medical prediction for a transgender patient
Scenario 2: CPS flag for a child with aggressive outbursts
Alternative Approaches to Personal Knowledge (Require Domain Knowledge)
Alternative Approaches: Related Work
Post-Hoc Prediction Combinations
Machine Learning Explanations
Our Problem Formulation Leads to Different Solutions: A Pedagogical Example
Comparison to Post-Hoc Prediction Combinations
Comparison to ML Explanations
Our Problem Formulation: Human Feature Integration
Problem Definition-Eliciting Human Features at Test Time
Problem Requirements
Technical Assumptions
...and 38 more sections

Figures (5)

Figure 1: This figure shows a toy dataset with a human feature (y axis) and a machine feature (x-axis) that are both needed to produce the true decision boundary (solid line). The orange dashed line ($x=0$) and the blue dotted line ($y=0$) are the best fit lines based on the human and machine feature respectively. Both mis-classify the 2 red-circled points, but a model that is able to use both features can classify perfectly along $y=x$.
Figure 2: Test f1-score as a function of $B$ for both instance-wise feature selection methods, baselines and upper bound in recipe dataset (left) and birds dataset (right). Error bars are standard errors over 10 random restarts. all-features performs best and machine-only worst with the methods using a subset of human features in between. entropy-retrain outperforms feature-selection in both domains.
Figure 3: Test f1-score on birds as a function of $B$ after adding first 6 feature-selection queries to machine features and re-running. Error bars are standard errors from 5 random restarts. By query 6, both entropy methods substantially outperform feature-selection.
Figure 4: Test f1-score as a function of $B$ for both entropy-selection, and the plausible-classes and surprising-features baselines in recipe (left) and birds (right) datasets. Error bars are standard errors over 10 random restarts. The entropy-selection approach substantially outperforms the two baselines for choosing $q$, demonstrating that the additional complexity provides additional predictive value.
Figure 5: Probability of querying a human feature (y-axis, top 20 sorted by # times queried) given a machine feature (x-axis) in the instance in recipe. Computed on test set for randomly chosen restart. "cucumber"--"rice vinegar" and "turmeric"--"garam masala" associations are sensible.

Towards Integrating Personal Knowledge into Test-Time Predictions

TL;DR

Abstract

Towards Integrating Personal Knowledge into Test-Time Predictions

Authors

TL;DR

Abstract

Table of Contents

Figures (5)