Table of Contents
Fetching ...

Engineering Trustworthy Automation: Design Principles and Evaluation for AutoML Tools for Novices

Jarne Thys, Davy Vanacken, Gustavo Rovelo Ruiz

TL;DR

The paper addresses the challenge of making AutoML accessible to novices by proposing an abstract end-to-end pipeline that covers data intake, guided configuration, training, evaluation, and inference. It introduces NovaClass, a novice-friendly prototype for transformer-based text classification that emphasizes one-click training, cascade classification, and metadata-driven inference, paired with a context-aware assistant. A 24-participant study shows that all users can complete end-to-end tasks with positive user experience, though experienced users report higher trust and understanding than novices, highlighting gaps in mental models and transparency. From these findings, the authors derive four design principles to improve novice AutoML tools: ensure first-model success, provide explanations, offer abstractions with context-aware assistance, and enforce predictability and safeguards, guiding future development toward usable, trustworthy end-to-end AutoML for non-experts.

Abstract

AutoML systems targeting novices often prioritize algorithmic automation over usability, leaving gaps in users' understanding, trust, and end-to-end workflow support. To address these issues, we propose an abstract pipeline that covers data intake, guided configuration, training, evaluation, and inference. To examine the abstract pipeline, we report a user study where we assess trust, understandability, and UX of a prototype implementation. In a 24-participant study, all participants successfully built their own models, UEQ ratings were positive, yet experienced users reported higher trust and understanding than novices. Based on this study, we propose four design principles to improve the design of AutoML systems targeting novices: (P1) support first-model success to enhance user self-efficacy, (P2) provide explanations to help users form correct mental models and develop appropriate levels of reliance, (P3) provide abstractions and context-aware assistance to keep users in their zone of proximal development, and (P4) ensure predictability and safeguards to strengthen users' sense of control.

Engineering Trustworthy Automation: Design Principles and Evaluation for AutoML Tools for Novices

TL;DR

The paper addresses the challenge of making AutoML accessible to novices by proposing an abstract end-to-end pipeline that covers data intake, guided configuration, training, evaluation, and inference. It introduces NovaClass, a novice-friendly prototype for transformer-based text classification that emphasizes one-click training, cascade classification, and metadata-driven inference, paired with a context-aware assistant. A 24-participant study shows that all users can complete end-to-end tasks with positive user experience, though experienced users report higher trust and understanding than novices, highlighting gaps in mental models and transparency. From these findings, the authors derive four design principles to improve novice AutoML tools: ensure first-model success, provide explanations, offer abstractions with context-aware assistance, and enforce predictability and safeguards, guiding future development toward usable, trustworthy end-to-end AutoML for non-experts.

Abstract

AutoML systems targeting novices often prioritize algorithmic automation over usability, leaving gaps in users' understanding, trust, and end-to-end workflow support. To address these issues, we propose an abstract pipeline that covers data intake, guided configuration, training, evaluation, and inference. To examine the abstract pipeline, we report a user study where we assess trust, understandability, and UX of a prototype implementation. In a 24-participant study, all participants successfully built their own models, UEQ ratings were positive, yet experienced users reported higher trust and understanding than novices. Based on this study, we propose four design principles to improve the design of AutoML systems targeting novices: (P1) support first-model success to enhance user self-efficacy, (P2) provide explanations to help users form correct mental models and develop appropriate levels of reliance, (P3) provide abstractions and context-aware assistance to keep users in their zone of proximal development, and (P4) ensure predictability and safeguards to strengthen users' sense of control.

Paper Structure

This paper contains 24 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: An abstract, end-to-end pipeline for AutoML tools targeting novice users. It begins with Data Intake and Upfront Safeguards, then narrows decisions to Key Model Parameter Configuration before One-Click Training. After training, Simplified Results explain performance in simplified terms, while Save Model and Metadata preserves an inference "contract." Finally, Auto-Configured Inference uses that contract to generate input fields and outputs consistently.
  • Figure 4: Distribution of UEQ responses and comparison to the UEQ benchmark. The colored background bars are reference bands from the UEQ benchmark: Bad (bottom 25%), Below average (25-50th percentile), Above average (50-75th percentile), Good (75-90th percentile), and Excellent (top 10%). For each scale (range -3 to +3), the black diamond and whiskers show our sample mean and 95% CI; their position against the bands indicates the benchmark class of our product. Overall, ratings are positive, highest for Efficiency, Attractiveness, and Perspicuity; positive but more moderate for Dependability and Stimulation; and comparatively lower for Novelty.