AllHands: Ask Me Anything on Large-scale Verbatim Feedback via Large Language Models
Chaoyun Zhang, Zicheng Ma, Yuhao Wu, Shilin He, Si Qin, Minghua Ma, Xiaoting Qin, Yu Kang, Yuyi Liang, Xiaoyu Gou, Yajie Xue, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, Qi Zhang
TL;DR
AllHands tackles the challenge of large-scale verbatim feedback analysis by integrating two stages—classification with abstractive topic modeling—and a Python-based QA agent that converts natural-language questions into executable code. The system leverages in-context learning and human-in-the-loop refinement to produce structured feedback and human-readable topics, while the QA agent provides multi-modal outputs (text, code, tables, images) through a Planner–Code Generator–Code Executor pipeline. Evaluations on three diverse datasets show AllHands surpasses traditional baselines in classification accuracy and topic coherence, and delivers high-quality, open-ended QA responses, especially when paired with GPT-4. The work demonstrates a practical, extensible pathway to universal, natural-language feedback analytics with broad applicability in software engineering and product development.
Abstract
Verbatim feedback constitutes a valuable repository of user experiences, opinions, and requirements essential for software development. Effectively and efficiently extracting valuable insights from such data poses a challenging task. This paper introduces Allhands , an innovative analytic framework designed for large-scale feedback analysis through a natural language interface, leveraging large language models (LLMs). Allhands adheres to a conventional feedback analytic workflow, initially conducting classification and topic modeling on the feedback to convert them into a structurally augmented format, incorporating LLMs to enhance accuracy, robustness, generalization, and user-friendliness. Subsequently, an LLM agent is employed to interpret users' diverse questions in natural language on feedback, translating them into Python code for execution, and delivering comprehensive multi-modal responses, including text, code, tables, and images. We evaluate Allhands across three diverse feedback datasets. The experiments demonstrate that Allhands achieves superior efficacy at all stages of analysis, including classification and topic modeling, eventually providing users with an "ask me anything" experience with comprehensive, correct and human-readable response. To the best of our knowledge, Allhands stands as the first comprehensive feedback analysis framework that supports diverse and customized requirements for insight extraction through a natural language interface.
