WeMusic-Agent: Efficient Conversational Music Recommendation via Knowledge Internalization and Agentic Boundary Learning
Wendong Bi, Yirong Mao, Xianglong Liu, Kai Tian, Jian Zhang, Hanjie Wang, Wenhui Que
TL;DR
WeMusic-Agent tackles the challenge of conversational music recommendation by uniting extensive music knowledge internalization with an agentic boundary learning framework that decides when to invoke external tools. The authors introduce MusicCPT and WeMusic-Base to internalize music knowledge via large-scale continual pretraining and multi-turn SFT, augmented by a multi-objective reinforcement learning regime. They then extend to WeMusic-Agent-M1 through curriculum learning and controllable RL to balance internal knowledge and tool use, demonstrating improved personalization, relevance, and efficiency on the new WeMusic-Bench benchmark derived from WeChat Listen data. The results show that agentic boundary learning expands capability boundaries beyond purely internalized models, enabling robust playlist-level recommendations and efficient tool usage. The work provides a practical framework and dataset for evaluating CRS in real-world, language-specific settings and highlights the value of combining knowledge internalization with targeted tool use in music recommendation.
Abstract
Personalized music recommendation in conversational scenarios usually requires a deep understanding of user preferences and nuanced musical context, yet existing methods often struggle with balancing specialized domain knowledge and flexible tool integration. This paper proposes WeMusic-Agent, a training framework for efficient LLM-based conversational music recommendation. By integrating the knowledge internalization and agentic boundary learning, the framework aims to teach the model to intelligently decide when to leverage internalized knowledge and when to call specialized tools (e.g., music retrieval APIs, music recommendation systems). Under this framework, we present WeMusic-Agent-M1, an agentic model that internalizes extensive musical knowledge via continued pretraining on 50B music-related corpus while acquiring the ability to invoke external tools when necessary. Additionally, considering the lack of open-source benchmarks for conversational music recommendation, we also construct a benchmark for personalized music recommendations derived from real-world data in WeChat Listen. This benchmark enables comprehensive evaluation across multiple dimensions, including relevance, personalization, and diversity of the recommendations. Experiments on real-world data demonstrate that WeMusic-Agent achieves significant improvements over existing models.
