DroidCall: A Dataset for LLM-powered Android Intent Invocation
Weikai Xie, Li Zhang, Shihe Wang, Rongjie Yi, Mengwei Xu
TL;DR
DroidCall tackles the challenge of accurate Android intent invocation on-device by providing an open-source 10k-sample dataset and a reusable data-generation pipeline to fine-tune small LLMs for function calling to Android intents. The workflow defines 24 predefined functions that encapsulate common Android operations, uses a self-instruct data-generation process, and applies LoRA-based fine-tuning on edge-friendly models. The authors demonstrate substantial performance gains, with some small models surpassing GPT-4o on this task and achieving high accuracy with shorter prompts. An end-to-end on-device demo with mllm and a mobile app shows the practicality of deploying on-device agents with privacy and latency benefits.
Abstract
The growing capabilities of large language models in natural language understanding significantly strengthen existing agentic systems. To power performant on-device mobile agents for better data privacy, we introduce DroidCall, the first training and testing dataset for accurate Android intent invocation. With a highly flexible and reusable data generation pipeline, we constructed 10k samples in DroidCall. Given a task instruction in natural language, small language models such as Qwen2.5-3B and Gemma2-2B fine-tuned with DroidCall can approach or even surpass the capabilities of GPT-4o for accurate Android intent invocation. We also provide an end-to-end Android app equipped with these fine-tuned models to demonstrate the Android intent invocation process. The code and dataset are available at https://github.com/UbiquitousLearning/DroidCall.
