Table of Contents
Fetching ...

DroidCall: A Dataset for LLM-powered Android Intent Invocation

Weikai Xie, Li Zhang, Shihe Wang, Rongjie Yi, Mengwei Xu

TL;DR

DroidCall tackles the challenge of accurate Android intent invocation on-device by providing an open-source 10k-sample dataset and a reusable data-generation pipeline to fine-tune small LLMs for function calling to Android intents. The workflow defines 24 predefined functions that encapsulate common Android operations, uses a self-instruct data-generation process, and applies LoRA-based fine-tuning on edge-friendly models. The authors demonstrate substantial performance gains, with some small models surpassing GPT-4o on this task and achieving high accuracy with shorter prompts. An end-to-end on-device demo with mllm and a mobile app shows the practicality of deploying on-device agents with privacy and latency benefits.

Abstract

The growing capabilities of large language models in natural language understanding significantly strengthen existing agentic systems. To power performant on-device mobile agents for better data privacy, we introduce DroidCall, the first training and testing dataset for accurate Android intent invocation. With a highly flexible and reusable data generation pipeline, we constructed 10k samples in DroidCall. Given a task instruction in natural language, small language models such as Qwen2.5-3B and Gemma2-2B fine-tuned with DroidCall can approach or even surpass the capabilities of GPT-4o for accurate Android intent invocation. We also provide an end-to-end Android app equipped with these fine-tuned models to demonstrate the Android intent invocation process. The code and dataset are available at https://github.com/UbiquitousLearning/DroidCall.

DroidCall: A Dataset for LLM-powered Android Intent Invocation

TL;DR

DroidCall tackles the challenge of accurate Android intent invocation on-device by providing an open-source 10k-sample dataset and a reusable data-generation pipeline to fine-tune small LLMs for function calling to Android intents. The workflow defines 24 predefined functions that encapsulate common Android operations, uses a self-instruct data-generation process, and applies LoRA-based fine-tuning on edge-friendly models. The authors demonstrate substantial performance gains, with some small models surpassing GPT-4o on this task and achieving high accuracy with shorter prompts. An end-to-end on-device demo with mllm and a mobile app shows the practicality of deploying on-device agents with privacy and latency benefits.

Abstract

The growing capabilities of large language models in natural language understanding significantly strengthen existing agentic systems. To power performant on-device mobile agents for better data privacy, we introduce DroidCall, the first training and testing dataset for accurate Android intent invocation. With a highly flexible and reusable data generation pipeline, we constructed 10k samples in DroidCall. Given a task instruction in natural language, small language models such as Qwen2.5-3B and Gemma2-2B fine-tuned with DroidCall can approach or even surpass the capabilities of GPT-4o for accurate Android intent invocation. We also provide an end-to-end Android app equipped with these fine-tuned models to demonstrate the Android intent invocation process. The code and dataset are available at https://github.com/UbiquitousLearning/DroidCall.

Paper Structure

This paper contains 20 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Small language models fine-tuned with DroidCall have the capability to assist users in completing common tasks such as adding events to the calendar.
  • Figure 2: Workflow of DroidCall, which consist of three key phases:(1) Functions Predefinition; (2) Data Generation; (3) Finetuning and Evaluation.
  • Figure 3: (a) shows how implicit intent works in Android; (b) shows an example to help user set an alarm with implicit intent.
  • Figure 4: Details of data generation in DroidCall. To avoid manually creating seed data, DroidCall initially samples examples from an external dataset to generate its first set of data. Subsequently, the data is used as seed data to continuously generate new data, thereby eliminating the need for laborious manual work. All the generated data will go through a set of customized filters to ensure the correctness of data formats and the diversity of the data.
  • Figure 5: Design of our demo.
  • ...and 1 more figures