Table of Contents
Fetching ...

Trust Calibration in IDEs: Paving the Way for Widespread Adoption of AI Refactoring

Markus Borg

TL;DR

The paper addresses the risk-prone landscape of AI-assisted refactoring by advocating IDE-embedded interaction with LLMs and rigorous safeguards to ensure trustworthy contributions to codebases. It grounds trust development in established human-factors theories and outlines an action-research program within CodeScene to iteratively design, test, and refine both safety mechanisms and UI cues that calibrate developer trust. By combining large-scale telemetry, A/B testing, and targeted program/vulnerability analyses, the work aims to enable widespread, responsible adoption of AI-driven refactoring. The practical impact lies in delivering a structured pathway for integrating powerful AI tools into developers’ existing workflows without compromising code integrity or user trust. This multi-faceted approach seeks to align perceived trust with actual AI capabilities, reducing both under- and over-reliance on automated refactoring.

Abstract

In the software industry, the drive to add new features often overshadows the need to improve existing code. Large Language Models (LLMs) offer a new approach to improving codebases at an unprecedented scale through AI-assisted refactoring. However, LLMs come with inherent risks such as braking changes and the introduction of security vulnerabilities. We advocate for encapsulating the interaction with the models in IDEs and validating refactoring attempts using trustworthy safeguards. However, equally important for the uptake of AI refactoring is research on trust development. In this position paper, we position our future work based on established models from research on human factors in automation. We outline action research within CodeScene on development of 1) novel LLM safeguards and 2) user interaction that conveys an appropriate level of trust. The industry collaboration enables large-scale repository analysis and A/B testing to continuously guide the design of our research interventions.

Trust Calibration in IDEs: Paving the Way for Widespread Adoption of AI Refactoring

TL;DR

The paper addresses the risk-prone landscape of AI-assisted refactoring by advocating IDE-embedded interaction with LLMs and rigorous safeguards to ensure trustworthy contributions to codebases. It grounds trust development in established human-factors theories and outlines an action-research program within CodeScene to iteratively design, test, and refine both safety mechanisms and UI cues that calibrate developer trust. By combining large-scale telemetry, A/B testing, and targeted program/vulnerability analyses, the work aims to enable widespread, responsible adoption of AI-driven refactoring. The practical impact lies in delivering a structured pathway for integrating powerful AI tools into developers’ existing workflows without compromising code integrity or user trust. This multi-faceted approach seeks to align perceived trust with actual AI capabilities, reducing both under- and over-reliance on automated refactoring.

Abstract

In the software industry, the drive to add new features often overshadows the need to improve existing code. Large Language Models (LLMs) offer a new approach to improving codebases at an unprecedented scale through AI-assisted refactoring. However, LLMs come with inherent risks such as braking changes and the introduction of security vulnerabilities. We advocate for encapsulating the interaction with the models in IDEs and validating refactoring attempts using trustworthy safeguards. However, equally important for the uptake of AI refactoring is research on trust development. In this position paper, we position our future work based on established models from research on human factors in automation. We outline action research within CodeScene on development of 1) novel LLM safeguards and 2) user interaction that conveys an appropriate level of trust. The industry collaboration enables large-scale repository analysis and A/B testing to continuously guide the design of our research interventions.

Paper Structure

This paper contains 5 sections, 3 figures.

Figures (3)

  • Figure 1: Balancing trust and trustworthiness for AI refactoring.
  • Figure 2: Trust development for a skeptical user of AI refactoring.
  • Figure 3: Factors that affect trust in AI refactoring.