Table of Contents
Fetching ...

QoNext: Towards Next-generation QoE for Foundation Models

Yijin Guo, Zicheng Zhang, Ye Shen, Farong Wen, Junying Wang, Qi Jia, Guangtao Zhai

TL;DR

QoNext introduces a QoE-inspired framework to evaluate foundation models by jointly considering content quality and service quality during interactive use. It builds a large, labeled database by systematically varying five factors (content accuracy, information density, output speed, latency position, and latency duration) and collects human ratings across diverse dialogues, yielding insights on how each factor shapes user experience. Regression models trained on this database predict subjective experience with high rank-order consistency (SRCC around 0.78–0.79), demonstrating the feasibility of objective predictions of user-perceived quality. The work also reveals content accuracy as the primary determinant of overall experience and shows MBTI-based personalization can reveal differential sensitivities to latency and density, offering practical guidance for adaptive optimization and productized service design in human-centric AI systems.

Abstract

Existing evaluations of foundation models, including recent human-centric approaches, fail to capture what truly matters: user's experience during interaction. Current methods treat evaluation as a matter of output correctness alone, overlooking that user satisfaction emerges from the interplay between response quality and interaction, which limits their ability to account for the mechanisms underlying user experience. To address this gap, we introduce QoNext, the first framework that adapts Quality of Experience (QoE) principles from networking and multimedia to the assessment of foundation models. QoNext identifies experiential factors that shape user experience and incorporates them into controlled experiments, where human ratings are collected under varied configurations. From these studies we construct a QoE-oriented database and train predictive models that estimate perceived user experience from measurable system parameters. Our results demonstrate that QoNext not only enables proactive and fine-grained evaluation but also provides actionable guidance for productized services of optimizing foundation models in practice.

QoNext: Towards Next-generation QoE for Foundation Models

TL;DR

QoNext introduces a QoE-inspired framework to evaluate foundation models by jointly considering content quality and service quality during interactive use. It builds a large, labeled database by systematically varying five factors (content accuracy, information density, output speed, latency position, and latency duration) and collects human ratings across diverse dialogues, yielding insights on how each factor shapes user experience. Regression models trained on this database predict subjective experience with high rank-order consistency (SRCC around 0.78–0.79), demonstrating the feasibility of objective predictions of user-perceived quality. The work also reveals content accuracy as the primary determinant of overall experience and shows MBTI-based personalization can reveal differential sensitivities to latency and density, offering practical guidance for adaptive optimization and productized service design in human-centric AI systems.

Abstract

Existing evaluations of foundation models, including recent human-centric approaches, fail to capture what truly matters: user's experience during interaction. Current methods treat evaluation as a matter of output correctness alone, overlooking that user satisfaction emerges from the interplay between response quality and interaction, which limits their ability to account for the mechanisms underlying user experience. To address this gap, we introduce QoNext, the first framework that adapts Quality of Experience (QoE) principles from networking and multimedia to the assessment of foundation models. QoNext identifies experiential factors that shape user experience and incorporates them into controlled experiments, where human ratings are collected under varied configurations. From these studies we construct a QoE-oriented database and train predictive models that estimate perceived user experience from measurable system parameters. Our results demonstrate that QoNext not only enables proactive and fine-grained evaluation but also provides actionable guidance for productized services of optimizing foundation models in practice.

Paper Structure

This paper contains 52 sections, 9 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Motivation of QoNext. As foundation models are increasingly deployed as products, ensuring a consistently high-quality user experience has become critically important across both service and content. Existing evaluation methods leave a gap in assessing these two aspects simultaneously. This limitation motivates our proposal of QoNext to bridge this gap and enable more comprehensive human-centric evaluation.
  • Figure 2: Framework of our QoNext. To build a comprehensive database, we conduct a human-annotation experiment where participants initially complete a personal-traits questionnaire before engaging in controlled, task-specific activities. Based on this database, regression models are trained to fit human ratings, and the trained models are further utilized to predict user scores and guide the optimization of foundation model.
  • Figure 3: The MOS distributions cross QoS dimensions in (a) and context quality dimensions in (b). The x-axis uses param ID of the dimensions (see Table \ref{['tab:dimension_design']}).
  • Figure 4: PCA explained variance. It presents the proportion of total variance explained by each principal component $PC_k$. On the x-axis, TOTAL denotes the result using all data, and the remaining groups correspond to MBTI-based subsets.
  • Figure 5: PCA Loading Heatmap. It shows the loadings of each variable on different principal components, reflecting the contribution of each factor to the formation of the components. The x-axis uses abbreviated dimension names (see Table \ref{['tab:dimension_design']}). (i) denotes the result using all data, and the remaining subplots correspond to MBTI-based subsets.
  • ...and 5 more figures