Can You Move These Over There? An LLM-based VR Mover for Supporting Object Manipulation
Xiangzhi Eric Wang, Zackary P. T. Sin, Ye Jia, Daniel Archer, Wynonna H. Y. Fong, Qing Li, Chen Li
TL;DR
VR Mover introduces an LLM-powered natural interface for VR object manipulation, combining pointing, speech, and memory-aware reasoning to support coarse-to-fine placement of multiple objects. The system integrates scene modelling, a user-centric augmentation pipeline, and real-time LLM-driven scene updates to produce rapid, structured API calls. In a user study, VR Mover reduced workload and arm fatigue and yielded higher usability and hedonic experience, particularly for multi-object tasks, though single-object mid-air manipulation saw limited gains. The work demonstrates practical benefits of language-enabled, context-aware interaction in VR and suggests design directions for more intuitive, efficient future interfaces. The results indicate that a natural, memory-informed LLM interface can complement traditional gizmos and hands for flexible VR object manipulation with broad applicability.
Abstract
In our daily lives, we can naturally convey instructions for the spatial manipulation of objects using words and gestures. Transposing this form of interaction into virtual reality (VR) object manipulation can be beneficial. We propose VR Mover, an LLM-empowered solution that can understand and interpret the user's vocal instruction to support object manipulation. By simply pointing and speaking, the LLM can manipulate objects without structured input. Our user study demonstrates that VR Mover enhances user usability, overall experience and performance on multi-object manipulation, while also reducing workload and arm fatigue. Users prefer the proposed natural interface for broad movements and may complementarily switch to gizmos or virtual hands for finer adjustments. These findings are believed to contribute to design implications for future LLM-based object manipulation interfaces, highlighting the potential for more intuitive and efficient user interactions in VR environments.
