Table of Contents
Fetching ...

GenLens: A Systematic Evaluation of Visual GenAI Model Outputs

Tica Lin, Hanspeter Pfister, Jui-Hsien Wang

TL;DR

GenAI visual outputs require robust early-stage evaluation, yet practitioners lack scalable tools beyond dataset quality and post-hoc explainability. The authors conducted a formative industrial study to reveal workflow gaps and then built GenLens, a visual analytics interface that supports discovering patterns, annotating issues, and analyzing aggregated results to guide model training. A user study with four experienced developers demonstrated high usefulness, ease of use, and intention to adopt GenLens, underscoring the value of collaborative, human-centered evaluation in GenAI development. The work highlights the importance of structured, early-stage evaluation tools for fair and high-quality GenAI outputs and points to future directions in automatic failure categorization and broader applicability across visual GenAI tasks.

Abstract

The rapid development of generative AI (GenAI) models in computer vision necessitates effective evaluation methods to ensure their quality and fairness. Existing tools primarily focus on dataset quality assurance and model explainability, leaving a significant gap in GenAI output evaluation during model development. Current practices often depend on developers' subjective visual assessments, which may lack scalability and generalizability. This paper bridges this gap by conducting a formative study with GenAI model developers in an industrial setting. Our findings led to the development of GenLens, a visual analytic interface designed for the systematic evaluation of GenAI model outputs during the early stages of model development. GenLens offers a quantifiable approach for overviewing and annotating failure cases, customizing issue tags and classifications, and aggregating annotations from multiple users to enhance collaboration. A user study with model developers reveals that GenLens effectively enhances their workflow, evidenced by high satisfaction rates and a strong intent to integrate it into their practices. This research underscores the importance of robust early-stage evaluation tools in GenAI development, contributing to the advancement of fair and high-quality GenAI models.

GenLens: A Systematic Evaluation of Visual GenAI Model Outputs

TL;DR

GenAI visual outputs require robust early-stage evaluation, yet practitioners lack scalable tools beyond dataset quality and post-hoc explainability. The authors conducted a formative industrial study to reveal workflow gaps and then built GenLens, a visual analytics interface that supports discovering patterns, annotating issues, and analyzing aggregated results to guide model training. A user study with four experienced developers demonstrated high usefulness, ease of use, and intention to adopt GenLens, underscoring the value of collaborative, human-centered evaluation in GenAI development. The work highlights the importance of structured, early-stage evaluation tools for fair and high-quality GenAI outputs and points to future directions in automatic failure categorization and broader applicability across visual GenAI tasks.

Abstract

The rapid development of generative AI (GenAI) models in computer vision necessitates effective evaluation methods to ensure their quality and fairness. Existing tools primarily focus on dataset quality assurance and model explainability, leaving a significant gap in GenAI output evaluation during model development. Current practices often depend on developers' subjective visual assessments, which may lack scalability and generalizability. This paper bridges this gap by conducting a formative study with GenAI model developers in an industrial setting. Our findings led to the development of GenLens, a visual analytic interface designed for the systematic evaluation of GenAI model outputs during the early stages of model development. GenLens offers a quantifiable approach for overviewing and annotating failure cases, customizing issue tags and classifications, and aggregating annotations from multiple users to enhance collaboration. A user study with model developers reveals that GenLens effectively enhances their workflow, evidenced by high satisfaction rates and a strong intent to integrate it into their practices. This research underscores the importance of robust early-stage evaluation tools in GenAI development, contributing to the advancement of fair and high-quality GenAI models.
Paper Structure (16 sections, 3 figures)

This paper contains 16 sections, 3 figures.

Figures (3)

  • Figure 1: Annotation Mode interface. Users can (1) specify their own issue tags and (2) annotate each output with icons and comments.
  • Figure 2: Analyze Page interface. Users can (1) analyze model issues with summary bar charts and (2) derive insights towards model performance based on aggregated annotations in the model summary cards. (3) A summary report can be exported as pdf.
  • Figure 3: User TAM rating results. Participants rated GenLens as helpful (Q1-Q3), easy to use (Q4-Q6) to support their model evaluation. They also expressed high intent to use GenLens in their workflow (Q7).