Pocket RAG: On-Device RAG for First Aid Guidance in Offline Mobile Environment
Dong Ho Kang, Hyunjoon Lee, Hyeonjeong Cha, Minkyu Choi, Sungsoo Lim
TL;DR
This work tackles the challenge of delivering reliable first-aid guidance in completely offline, resource-constrained mobile environments by deploying an on-device retrieval-augmented generation (RAG) system. It introduces a memory-aware, resource-optimized pipeline that combines lexical and semantic retrieval, selective context compression, batched decoding, and 8-bit quantization to fit within a 2 GB Android memory cap while achieving rapid response times. Across WHO-derived physical and psychological first aid datasets, the approach yields high accuracy (up to 97.0%) and substantial latency reductions (TTFT ~3.7 s), demonstrating the practicality of offline mobile guidance. The framework is open-source and modular, enabling straightforward updates to both language models and medical knowledge bases, with implications for AI for social good in disaster settings.
Abstract
In disaster scenarios or remote areas, first responders often lose network connectivity when providing first aid. In such situations, server-based AI systems fail to provide critical guidance. To address this issue, we present a lightweight, mobile-based retrieval-augmented generation system for small language models (SLMs) that can run directly on Android devices. Our system integrates a mobile-friendly optimized pipeline featuring Hybrid RAG, selective compression, batched prompt decoding, and quantization caching. Despite the model's small size, our RAG-based system achieves 94.5\% accuracy for physical first aid and 97.0\% for psychological first aid. Additionally, we reduce response time from 14.2s to 3.7s, achieving a nearly 4x speedup. These results prove that our system is practical and can deliver reliable first aid guidance even without internet connectivity.
