Table of Contents
Fetching ...

Learning, Potential, and Retention: An Approach for Evaluating Adaptive AI-Enabled Medical Devices

Alexis Burgon, Berkman Sahiner, Nicholas A Petrick, Gene Pennello, Ravi K Samala

Abstract

This work addresses challenges in evaluating adaptive artificial intelligence (AI) models for medical devices, where iterative updates to both models and evaluation datasets complicate performance assessment. We introduce a novel approach with three complementary measurements: learning (model improvement on current data), potential (dataset-driven performance shifts), and retention (knowledge preservation across modification steps), to disentangle performance changes caused by model adaptations versus dynamic environments. Case studies using simulated population shifts demonstrate the approach's utility: gradual transitions enable stable learning and retention, while rapid shifts reveal trade-offs between plasticity and stability. These measurements provide practical insights for regulatory science, enabling rigorous assessment of the safety and effectiveness of adaptive AI systems over sequential modifications.

Learning, Potential, and Retention: An Approach for Evaluating Adaptive AI-Enabled Medical Devices

Abstract

This work addresses challenges in evaluating adaptive artificial intelligence (AI) models for medical devices, where iterative updates to both models and evaluation datasets complicate performance assessment. We introduce a novel approach with three complementary measurements: learning (model improvement on current data), potential (dataset-driven performance shifts), and retention (knowledge preservation across modification steps), to disentangle performance changes caused by model adaptations versus dynamic environments. Case studies using simulated population shifts demonstrate the approach's utility: gradual transitions enable stable learning and retention, while rapid shifts reveal trade-offs between plasticity and stability. These measurements provide practical insights for regulatory science, enabling rigorous assessment of the safety and effectiveness of adaptive AI systems over sequential modifications.

Paper Structure

This paper contains 6 sections, 3 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: A simple adaptation timeline for an adaptive AI system with an initial implementation (timepoint 0) and five modification steps (timepoints 1-5), showing the transition between timepoints 2 (current) and 3 (new).
  • Figure 2: Comparison of learning between Scenario A & Scenario B using a toy example. Despite showing the same performance at each comparable modification step (0 & 1), as indicated by their identical line plots, the two scenarios exhibit different learning. The performance change in Scenario A is due to a shift in dataset difficulty, whereas the performance change in Scenario B is due to an improvement in model knowledge.
  • Figure 3: Comparison of potential between Scenario A & Scenario B using a toy example. Despite showing the same performance at each comparable modification step (0 & 1), as indicated by their identical line plots, the two scenarios exhibit different potential. Scenario A demonstrates greater potential because the modification step 1 dataset presented a greater challenge to the modification step 0 model than was observed in Scenario B.
  • Figure 4: Comparison of retention between Scenario A & Scenario B using a toy example. Despite showing the same performance at comparable modification step (0 & 1), as indicated by their identical line plots, the two scenarios exhibit different retention. Scenario A shows lower retention because the modification step 1 model demonstrates greater performance degradation on the modification step 0 evaluation dataset.
  • Figure 5: (a) Population distribution of training, validation, and testing data, (b) learning & potential and (c) retention & performance for a model trained and evaluated on a dataset gradually transitioning from one population to another. Vertical markers indicate 95% confidence intervals from models across 25 repetitions.
  • ...and 2 more figures