VirtualXAI: A User-Centric Framework for Explainability Assessment Leveraging GPT-Generated Personas
Georgios Makridis, Vasileios Koukos, Georgios Fatouros, Dimosthenis Kyriazis
TL;DR
The paper tackles the challenge of evaluating explainability in AI by combining quantitative benchmarks with qualitative, user-centered insights. It introduces an XAI scoring framework that fuses fidelity, simplicity, stability, and accuracy metrics with LLM-generated virtual personas and a content-based recommender to tailor method and dataset choices for tabular data. Three core contributions are a tabular-data XAI scoring framework, an LLM-based qualitative assessment procedure, and a content-based recommender that maps dataset characteristics to historical benchmarks for domain-specific recommendations. The approach demonstrates that domain context matters for XAI method selection and emphasizes user-perceived interpretability alongside technical metrics, enabling more effective human-AI collaboration and practical XAI deployment.
Abstract
In today's data-driven era, computational systems generate vast amounts of data that drive the digital transformation of industries, where Artificial Intelligence (AI) plays a key role. Currently, the demand for eXplainable AI (XAI) has increased to enhance the interpretability, transparency, and trustworthiness of AI models. However, evaluating XAI methods remains challenging: existing evaluation frameworks typically focus on quantitative properties such as fidelity, consistency, and stability without taking into account qualitative characteristics such as satisfaction and interpretability. In addition, practitioners face a lack of guidance in selecting appropriate datasets, AI models, and XAI methods -a major hurdle in human-AI collaboration. To address these gaps, we propose a framework that integrates quantitative benchmarking with qualitative user assessments through virtual personas based on the "Anthology" of backstories of the Large Language Model (LLM). Our framework also incorporates a content-based recommender system that leverages dataset-specific characteristics to match new input data with a repository of benchmarked datasets. This yields an estimated XAI score and provides tailored recommendations for both the optimal AI model and the XAI method for a given scenario.
