Human Oversight-by-Design for Accessible Generative IUIs
Blessing Jerry, Lourdes Moreno, Paloma Martínez
TL;DR
The paper tackles the challenge of reliable human oversight in generative IUIs used in high-stakes settings by proposing an oversight-by-design architecture that embeds HITL and HOTL across the generation, evaluation, and governance stages. It details a model-based, end-to-end pipeline with SysML v2 traceability, standards-aligned templates, and structured risk signals that trigger mandatory human review when thresholds are violated or uncertainty is high. A healthcare communication use case grounds the approach, demonstrating how escalation-driven governance, auditable logs, and systematic feedback can enable scalable, verifiable oversight without sacrificing efficiency. The work emphasizes accessibility and plain-language presentation as both a quality and compliance criterion and a prerequisite for effective oversight, with the ultimate aim of safer, more trustworthy AI-assisted decision processes in high-stakes contexts.
Abstract
LLM-generated interfaces are increasingly used in high-consequence workflows (e.g., healthcare communication), where how information is presented can impact downstream actions. These interfaces and their content support human interaction with AI-assisted decision-making and communication processes and should remain accessible and usable for people with disabilities. Accessible plain-language interfaces serve as an enabling infrastructure for meaningful human oversight. In these contexts, ethical and trustworthiness risks, including hallucinations, semantic distortion, bias, and accessibility barriers, can undermine reliability and limit users' ability to understand, monitor, and intervene in AI-supported processes. Yet, in practice, oversight is often treated as a downstream check, without clear rules for when human intervention is required or who is accountable. We propose oversight-by-design: embedding human judgment across the pipeline as an architectural commitment, implemented via escalation policies and explicit UI controls for risk signalling and intervention. Automated checks flag risk in generated UI communication that supports high-stakes workflows (e.g., readability, semantic fidelity, factual consistency, and standards-based accessibility constraints) and escalate to mandatory Human-in-the-Loop (HITL) review before release when thresholds are violated, or uncertainty is high. Human-on-the-Loop (HOTL) supervision monitors system-level signals over time (alerts, escalation rates, and compliance evidence) to tune policies and detect drift. Structured review feedback is translated into governance actions (rule and prompt updates, threshold calibration, and traceable audit logs), enabling scalable intervention and verifiable oversight for generative UI systems that support high-stakes workflows.
