Efficient On-Device Agents via Adaptive Context Management
Sanidhya Vijayvargiya, Rahul Lokesh
TL;DR
This paper tackles the memory bottleneck of on-device AI agents by introducing a context-efficient framework that compresses conversational history into a Context State Object (CSO) via a dual-adapter memory system. It couples a token-efficient tool-schema representation with a just-in-time schema-passing mechanism to dramatically reduce initial context and growth while preserving or improving task performance on complex, multi-turn tasks. The approach is instantiated on a 3B parameter SLM and validated against baselines, showing more than a 6x reduction in initial context and 10x–25x reduction in growth rate, enabling persistent, capable on-device operation with local tools and cloud delegation when needed. The work offers a practical pathway toward private, low-latency AI assistants by balancing on-device computation with selective cloud reasoning, and it highlights concrete design patterns for memory management, tool orchestration, and data generation in resource-constrained environments.
Abstract
On-device AI agents offer the potential for personalized, low-latency assistance, but their deployment is fundamentally constrained by limited memory capacity, which restricts usable context. This reduced practical context window creates a trade-off between supporting rich, stateful interactions with complex tool capabilities and maintaining on-device feasibility. We break this trade-off with a framework for context-efficient on-device agents, driven by three synergistic optimizations (1) a dynamic memory system using specialized LoRA adapters to distill conversational history into a compressed, and structured Context State Object; (2) a minimalist serialization format for tool schemas to minimize token overhead per tool; and (3) a just-in-time schema-passing mechanism that loads full tool definitions only upon tool selection. We instantiate this framework by adapting a 3B parameter SLM to context-efficient trajectories and rigorously evaluate it against a conventional baseline on complex user tasks. Our agent matches, or exceeds, the performance of a conventional baseline while dramatically compressing context, achieving more than a 6-fold reduction in initial system prompt context and a 10- to 25-fold reduction in context growth rate based on the interaction verbosity, demonstrating that strategic context management is key to unlocking capable and persistent on-device AI.
