Table of Contents
Fetching ...

CAMPHOR: Collaborative Agents for Multi-input Planning and High-Order Reasoning On Device

Yicheng Fu, Raviteja Anantha, Jianpeng Cheng

TL;DR

CAMPHOR addresses the latency and privacy constraints of deploying language-enabled assistants on devices by introducing a hierarchical, on-device, multi-agent framework that coordinates specialized agents for personal context retrieval, device information, and task completion. The system uses dynamic prompt construction and prompt compression to fit memory budgets while maintaining high reasoning accuracy, and it validates the approach with the CAMPHOR dataset of device-grounded trajectories. Fine-tuned small language models outperform closed-source LLM baselines on task-completion metrics and eliminate server-device communication costs, underscoring the practical viability of private, on-device reasoning. The work offers a concrete dataset, a scalable architecture, and targeted compression techniques that together enable responsive, privacy-preserving mobile assistants with strong task performance.

Abstract

While server-side Large Language Models (LLMs) demonstrate proficiency in function calling and complex reasoning, deploying Small Language Models (SLMs) directly on devices brings opportunities to improve latency and privacy but also introduces unique challenges for accuracy and memory. We introduce CAMPHOR, an innovative on-device SLM multi-agent framework designed to handle multiple user inputs and reason over personal context locally, ensuring privacy is maintained. CAMPHOR employs a hierarchical architecture where a high-order reasoning agent decomposes complex tasks and coordinates expert agents responsible for personal context retrieval, tool interaction, and dynamic plan generation. By implementing parameter sharing across agents and leveraging prompt compression, we significantly reduce model size, latency, and memory usage. To validate our approach, we present a novel dataset capturing multi-agent task trajectories centered on personalized mobile assistant use-cases. Our experiments reveal that fine-tuned SLM agents not only surpass closed-source LLMs in task completion F1 by~35\% but also eliminate the need for server-device communication, all while enhancing privacy.

CAMPHOR: Collaborative Agents for Multi-input Planning and High-Order Reasoning On Device

TL;DR

CAMPHOR addresses the latency and privacy constraints of deploying language-enabled assistants on devices by introducing a hierarchical, on-device, multi-agent framework that coordinates specialized agents for personal context retrieval, device information, and task completion. The system uses dynamic prompt construction and prompt compression to fit memory budgets while maintaining high reasoning accuracy, and it validates the approach with the CAMPHOR dataset of device-grounded trajectories. Fine-tuned small language models outperform closed-source LLM baselines on task-completion metrics and eliminate server-device communication costs, underscoring the practical viability of private, on-device reasoning. The work offers a concrete dataset, a scalable architecture, and targeted compression techniques that together enable responsive, privacy-preserving mobile assistants with strong task performance.

Abstract

While server-side Large Language Models (LLMs) demonstrate proficiency in function calling and complex reasoning, deploying Small Language Models (SLMs) directly on devices brings opportunities to improve latency and privacy but also introduces unique challenges for accuracy and memory. We introduce CAMPHOR, an innovative on-device SLM multi-agent framework designed to handle multiple user inputs and reason over personal context locally, ensuring privacy is maintained. CAMPHOR employs a hierarchical architecture where a high-order reasoning agent decomposes complex tasks and coordinates expert agents responsible for personal context retrieval, tool interaction, and dynamic plan generation. By implementing parameter sharing across agents and leveraging prompt compression, we significantly reduce model size, latency, and memory usage. To validate our approach, we present a novel dataset capturing multi-agent task trajectories centered on personalized mobile assistant use-cases. Our experiments reveal that fine-tuned SLM agents not only surpass closed-source LLMs in task completion F1 by~35\% but also eliminate the need for server-device communication, all while enhancing privacy.

Paper Structure

This paper contains 17 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: CAMPHOR dataset simulates a user's smartphone environment, encompassing diverse personal information stored across multiple apps on the device.
  • Figure 2: An overview of multiple agents in CAMPHOR. The figure includes all agents for completeness. In practice, a subset of the agents can be invoked in arbitrary order until task completion.
  • Figure 3: Prompt compression technique. We use the pre-trained SLM itself as a text encoder to generate a single-token embedding for each function description, by taking the output embedding of the final token therein. The compressed function tokens are appended to the beginning of the prompt.
  • Figure 4: Retrieval recall at K computed with an external retrieval model for personal context agent and task completion agent.