Table of Contents
Fetching ...

iRULER: Intelligible Rubric-Based User-Defined LLM Evaluation for Revision

Jingwen Bai, Wei Soon Cheong, Philippe Muller, Brian Y Lim

TL;DR

In controlled experiments on writing revision and rubric creation, iRULER most improved validated LLM-judged review scores and was perceived as most helpful and aligned compared to read-only rubric and text-based LLM feedback.

Abstract

Large Language Models (LLMs) have become indispensable for evaluating writing. However, text feedback they provide is often unintelligible, generic, and not specific to user criteria. Inspired by structured rubrics in education and intelligible AI explanations, we propose iRULER following identified design guidelines to \textit{scaffold} the review process by \textit{specific} criteria, providing \textit{justification} for score selection, and offering \textit{actionable} revisions to target different quality levels. To \textit{qualify} user-defined criteria, we recursively used iRULER with a rubric-of-rubrics to iteratively \textit{refine} rubrics. In controlled experiments on writing revision and rubric creation, iRULER most improved validated LLM-judged review scores and was perceived as most helpful and aligned compared to read-only rubric and text-based LLM feedback. Qualitative findings further support how iRULER satisfies the design guidelines for user-defined feedback. This work contributes interactive rubric tools for intelligible LLM-based review and revision of writing, and user-defined rubric creation.

iRULER: Intelligible Rubric-Based User-Defined LLM Evaluation for Revision

TL;DR

In controlled experiments on writing revision and rubric creation, iRULER most improved validated LLM-judged review scores and was perceived as most helpful and aligned compared to read-only rubric and text-based LLM feedback.

Abstract

Large Language Models (LLMs) have become indispensable for evaluating writing. However, text feedback they provide is often unintelligible, generic, and not specific to user criteria. Inspired by structured rubrics in education and intelligible AI explanations, we propose iRULER following identified design guidelines to \textit{scaffold} the review process by \textit{specific} criteria, providing \textit{justification} for score selection, and offering \textit{actionable} revisions to target different quality levels. To \textit{qualify} user-defined criteria, we recursively used iRULER with a rubric-of-rubrics to iteratively \textit{refine} rubrics. In controlled experiments on writing revision and rubric creation, iRULER most improved validated LLM-judged review scores and was perceived as most helpful and aligned compared to read-only rubric and text-based LLM feedback. Qualitative findings further support how iRULER satisfies the design guidelines for user-defined feedback. This work contributes interactive rubric tools for intelligible LLM-based review and revision of writing, and user-defined rubric creation.
Paper Structure (91 sections, 3 equations, 19 figures, 6 tables)

This paper contains 91 sections, 3 equations, 19 figures, 6 tables.

Figures (19)

  • Figure 1: The iRULER Writing Revision User Interface. The (A) Writing Panel displays the current draft. The (B) Feedback Panel provides interactive evaluation, allowing users to B1) trigger assessment, B2) see the overall score, B3) interact with the rubric.
  • Figure 2: The interactive revision workflow in iRULER Writing Revision User Interface. Initial rubric scores provide Specific and Scaffolded feedback. Users can then get Justified feedback by clicking for (b) Why and (d) Why Not explanations. Requesting (e) a How To example generates Actionable, tracked revisions (g-i) that further Scaffold the writing process.
  • Figure 3: The iRULER Rubric Creation User Interface, which allows users to design a rubric in the (A) Design Panel and receive immediate meta-level feedback on its quality in the (B) Feedback Panel. This includes B2) a quality score, B3) a detailed breakdown, and B4) improvement suggestions.
  • Figure 4: The AI-assistive features integrated into the iRULER Rubric Creation UI's Rubric Design Panel. The system supports users throughout the design process by: A) Enhancing the initial task description; B) Recommending relevant criteria based on the description; C) Refining the descriptive text for a whole criterion at once; D) Refining individual performance level descriptors for granular control; and E) Generating a full set of descriptors for a user-defined criterion title.
  • Figure 5: The interactive workflow in iRULER Rubric Creation User Interface. (a) An initial assessment on the rubric-of-rubric provides Specific and Scaffolded feedback on the user's rubric. Users then get Justified feedback via (b) Why and (d) Why Not explanations. Requesting (e) a How To example generates (f$-$g) a full suggested rubric for side-by-side comparison, which can be adopted with (h) Apply This Example button.
  • ...and 14 more figures