Optimizing LLM Code Suggestions: Feedback-Driven Timing with Lightweight State Bounds
Mohammad Nour Al Awad, Sergey Ivanov, Olga Tikhonova
TL;DR
The paper tackles the problem of when to surface code suggestions from LLM-based assistants to reduce interruptions and wasted inferences. It introduces a bounded, feedback-driven adaptive delay controller that blends a logistic transform of recent acceptance rates with a lightweight cognitive-state anchor. In a two-month deployment with nine professional developers, the approach tripled acceptance relative to a no-delay baseline and dramatically reduced blind rejections, while cutting backend inferences per accepted completion by about 75%. These results demonstrate that timing, treated as a controllable design parameter, can substantially improve both usability and infrastructure efficiency for AI-powered code assistants, and point toward future multi-axis adaptivity across IDEs and languages.
Abstract
Large Language Models (LLMs) have transformed code auto-completion by generating context-aware suggestions. Yet, deciding when to present these suggestions remains underexplored, often leading to interruptions or wasted inference calls. We propose an adaptive timing mechanism that dynamically adjusts the delay before offering a suggestion based on real-time developer feedback. Our suggested method combines a logistic transform of recent acceptance rates with a bounded delay range, anchored by a high-level binary prediction of the developer's cognitive state. In a two-month deployment with professional developers, our system improved suggestion acceptance from 4.9% with no delay to 15.4% with static delays, and to 18.6% with adaptive timing-while reducing blind rejections (rejections without being read) from 8.3% to 0.36%. Together, these improvements increase acceptance and substantially reduce wasted inference calls by 75%, making LLM-based code assistants more efficient and cost-effective in practice.
