Bi-Fact: A Bidirectional Factorization-based Evaluation of Intent Extraction from UI Trajectories
Sapir Caduri, Anatoly Efros, Noam Kahlon, Danielle Cohen, Yoni Halpern, Ido Dagan
TL;DR
Bi-Fact introduces a fact-level evaluation for UI-driven intent extraction by decomposing intents into atomic facts and evaluating bidirectional support between gold and predicted intents. It employs an LLM-driven automatic evaluation workflow with three stages of reasoning for predicting, recalling, and assessing factual coverage. On two human-judged datasets, Bi-Fact achieves higher agreement with human judgments than lexical, ROUGE, and NLI baselines, e.g., $F1=0.722$, $Kappa=0.508$, and $r=0.781$ (p<0.001) for fact-level correlation. The approach offers a robust, granular metric that can improve downstream personalization and proactive UI assistance by better capturing fine-grained intent details.
Abstract
Evaluating intent extraction from GUIs demands accurate, fine-grained metrics. This paper introduces Bi-Fact, a novel method that decomposes intents into atomic facts and performs bidirectional comparisons to assess precision and recall. Experiments demonstrate Bi-Fact's superior correlation with human judgments compared to existing metrics, establishing a more robust evaluation framework for UI-driven intent understanding.
