TalkPlay-Tools: Conversational Music Recommendation with LLM Tool Calling
Seungheon Doh, Keunwoo Choi, Juhan Nam
TL;DR
This paper tackles the limitation of single-method LLM-based music recommenders by introducing a unified tool-calling framework that orchestrates multiple retrieval modalities (boolean, lexical, dense, and generative) within a sequential retrieval–reranking pipeline. The Music Recommendation Agent and an External Environment work together to plan tool usage, execute retrievals across SQL, BM25, text-to-item, item-to-item, user-to-item, and Semantic IDs, and generate fluent responses in a multimodal, multi-turn setting. The approach is evaluated on TalkPlayData 2 with a zero-shot setup, showing improvements in $Hit@K$ metrics over baselines and providing insights into tool usage patterns and success rates, including high effectiveness for personalization and semantic-id tools. The work demonstrates a practical, end-to-end paradigm for conversational music recommendation that leverages diverse data modalities and retrieval strategies, with potential impact for production systems seeking more nuanced and context-aware recommendations.
Abstract
While the recent developments in large language models (LLMs) have successfully enabled generative recommenders with natural language interactions, their recommendation behavior is limited, leaving other simpler yet crucial components such as metadata or attribute filtering underutilized in the system. We propose an LLM-based music recommendation system with tool calling to serve as a unified retrieval-reranking pipeline. Our system positions an LLM as an end-to-end recommendation system that interprets user intent, plans tool invocations, and orchestrates specialized components: boolean filters (SQL), sparse retrieval (BM25), dense retrieval (embedding similarity), and generative retrieval (semantic IDs). Through tool planning, the system predicts which types of tools to use, their execution order, and the arguments needed to find music matching user preferences, supporting diverse modalities while seamlessly integrating multiple database filtering methods. We demonstrate that this unified tool-calling framework achieves competitive performance across diverse recommendation scenarios by selectively employing appropriate retrieval methods based on user queries, envisioning a new paradigm for conversational music recommendation systems.
