Table of Contents
Fetching ...

Reassessing Java Code Readability Models with a Human-Centered Approach

Agnia Sergeyuk, Olga Lvova, Sergey Titov, Anastasiia Serova, Farid Bagirov, Evgeniia Kirillova, Timofey Bryksin

TL;DR

The paper investigates whether contemporary Java Code Readability metrics align with developers' judgments of AI-generated code. Using a human-centered approach, it first identifies 12 readability aspects via Repertory Grid with 15 developers, then builds a 120-snippet Java dataset evaluated by 390 programmers and four CR models. Results show a strong divergence among models and low human-model agreement (Scalabrino best with MCC ≈ 0.325), highlighting the need for human-centered readability metrics and data to better align LLM-generated code with developer expectations. The work offers open data and methods to refine CR objectives for fine-tuning code-oriented LLMs and emphasizes the subjectivity inherent in readability as a key consideration for future research.

Abstract

To ensure that Large Language Models (LLMs) effectively support user productivity, they need to be adjusted. Existing Code Readability (CR) models can guide this alignment. However, there are concerns about their relevance in modern software engineering since they often miss the developers' notion of readability and rely on outdated code. This research assesses existing Java CR models for LLM adjustments, measuring the correlation between their and developers' evaluations of AI-generated Java code. Using the Repertory Grid Technique with 15 developers, we identified 12 key code aspects influencing CR that were consequently assessed by 390 programmers when labeling 120 AI-generated snippets. Our findings indicate that when AI generates concise and executable code, it is often considered readable by CR models and developers. However, a limited correlation between these evaluations underscores the importance of future research on learning objectives for adjusting LLMs and on the aspects influencing CR evaluations included in predictive models.

Reassessing Java Code Readability Models with a Human-Centered Approach

TL;DR

The paper investigates whether contemporary Java Code Readability metrics align with developers' judgments of AI-generated code. Using a human-centered approach, it first identifies 12 readability aspects via Repertory Grid with 15 developers, then builds a 120-snippet Java dataset evaluated by 390 programmers and four CR models. Results show a strong divergence among models and low human-model agreement (Scalabrino best with MCC ≈ 0.325), highlighting the need for human-centered readability metrics and data to better align LLM-generated code with developer expectations. The work offers open data and methods to refine CR objectives for fine-tuning code-oriented LLMs and emphasizes the subjectivity inherent in readability as a key consideration for future research.

Abstract

To ensure that Large Language Models (LLMs) effectively support user productivity, they need to be adjusted. Existing Code Readability (CR) models can guide this alignment. However, there are concerns about their relevance in modern software engineering since they often miss the developers' notion of readability and rely on outdated code. This research assesses existing Java CR models for LLM adjustments, measuring the correlation between their and developers' evaluations of AI-generated Java code. Using the Repertory Grid Technique with 15 developers, we identified 12 key code aspects influencing CR that were consequently assessed by 390 programmers when labeling 120 AI-generated snippets. Our findings indicate that when AI generates concise and executable code, it is often considered readable by CR models and developers. However, a limited correlation between these evaluations underscores the importance of future research on learning objectives for adjusting LLMs and on the aspects influencing CR evaluations included in predictive models.
Paper Structure (31 sections, 1 figure, 4 tables)