Bi-Fact: A Bidirectional Factorization-based Evaluation of Intent Extraction from UI Trajectories

Sapir Caduri; Anatoly Efros; Noam Kahlon; Danielle Cohen; Yoni Halpern; Ido Dagan

Bi-Fact: A Bidirectional Factorization-based Evaluation of Intent Extraction from UI Trajectories

Sapir Caduri, Anatoly Efros, Noam Kahlon, Danielle Cohen, Yoni Halpern, Ido Dagan

TL;DR

Bi-Fact introduces a fact-level evaluation for UI-driven intent extraction by decomposing intents into atomic facts and evaluating bidirectional support between gold and predicted intents. It employs an LLM-driven automatic evaluation workflow with three stages of reasoning for predicting, recalling, and assessing factual coverage. On two human-judged datasets, Bi-Fact achieves higher agreement with human judgments than lexical, ROUGE, and NLI baselines, e.g., $F1=0.722$, $Kappa=0.508$, and $r=0.781$ (p<0.001) for fact-level correlation. The approach offers a robust, granular metric that can improve downstream personalization and proactive UI assistance by better capturing fine-grained intent details.

Abstract

Evaluating intent extraction from GUIs demands accurate, fine-grained metrics. This paper introduces Bi-Fact, a novel method that decomposes intents into atomic facts and performs bidirectional comparisons to assess precision and recall. Experiments demonstrate Bi-Fact's superior correlation with human judgments compared to existing metrics, establishing a more robust evaluation framework for UI-driven intent understanding.

Bi-Fact: A Bidirectional Factorization-based Evaluation of Intent Extraction from UI Trajectories

TL;DR

Abstract

Bi-Fact: A Bidirectional Factorization-based Evaluation of Intent Extraction from UI Trajectories

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)