Table of Contents
Fetching ...

Pocket RAG: On-Device RAG for First Aid Guidance in Offline Mobile Environment

Dong Ho Kang, Hyunjoon Lee, Hyeonjeong Cha, Minkyu Choi, Sungsoo Lim

TL;DR

This work tackles the challenge of delivering reliable first-aid guidance in completely offline, resource-constrained mobile environments by deploying an on-device retrieval-augmented generation (RAG) system. It introduces a memory-aware, resource-optimized pipeline that combines lexical and semantic retrieval, selective context compression, batched decoding, and 8-bit quantization to fit within a 2 GB Android memory cap while achieving rapid response times. Across WHO-derived physical and psychological first aid datasets, the approach yields high accuracy (up to 97.0%) and substantial latency reductions (TTFT ~3.7 s), demonstrating the practicality of offline mobile guidance. The framework is open-source and modular, enabling straightforward updates to both language models and medical knowledge bases, with implications for AI for social good in disaster settings.

Abstract

In disaster scenarios or remote areas, first responders often lose network connectivity when providing first aid. In such situations, server-based AI systems fail to provide critical guidance. To address this issue, we present a lightweight, mobile-based retrieval-augmented generation system for small language models (SLMs) that can run directly on Android devices. Our system integrates a mobile-friendly optimized pipeline featuring Hybrid RAG, selective compression, batched prompt decoding, and quantization caching. Despite the model's small size, our RAG-based system achieves 94.5\% accuracy for physical first aid and 97.0\% for psychological first aid. Additionally, we reduce response time from 14.2s to 3.7s, achieving a nearly 4x speedup. These results prove that our system is practical and can deliver reliable first aid guidance even without internet connectivity.

Pocket RAG: On-Device RAG for First Aid Guidance in Offline Mobile Environment

TL;DR

This work tackles the challenge of delivering reliable first-aid guidance in completely offline, resource-constrained mobile environments by deploying an on-device retrieval-augmented generation (RAG) system. It introduces a memory-aware, resource-optimized pipeline that combines lexical and semantic retrieval, selective context compression, batched decoding, and 8-bit quantization to fit within a 2 GB Android memory cap while achieving rapid response times. Across WHO-derived physical and psychological first aid datasets, the approach yields high accuracy (up to 97.0%) and substantial latency reductions (TTFT ~3.7 s), demonstrating the practicality of offline mobile guidance. The framework is open-source and modular, enabling straightforward updates to both language models and medical knowledge bases, with implications for AI for social good in disaster settings.

Abstract

In disaster scenarios or remote areas, first responders often lose network connectivity when providing first aid. In such situations, server-based AI systems fail to provide critical guidance. To address this issue, we present a lightweight, mobile-based retrieval-augmented generation system for small language models (SLMs) that can run directly on Android devices. Our system integrates a mobile-friendly optimized pipeline featuring Hybrid RAG, selective compression, batched prompt decoding, and quantization caching. Despite the model's small size, our RAG-based system achieves 94.5\% accuracy for physical first aid and 97.0\% for psychological first aid. Additionally, we reduce response time from 14.2s to 3.7s, achieving a nearly 4x speedup. These results prove that our system is practical and can deliver reliable first aid guidance even without internet connectivity.
Paper Structure (28 sections, 5 equations, 3 figures, 6 tables)

This paper contains 28 sections, 5 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Comparison between conventional cloud-dependent AI systems and our proposed on-device framework for first aid scenarios. Traditional approaches (a) fail when connectivity is compromised during disasters---precisely when first aid guidance is most critical. Our framework (b) addresses this limitation by embedding a Local SLM with RAG pipeline directly on mobile devices, enabling reliable access to WHO first aid knowledge without internet dependency.
  • Figure 2: System architecture showing the on-device RAG pipeline (left) and evaluation data preparation workflow (right). The enhanced pipeline incorporates Selective Context compression and KV Cache Quantization to maximize responsiveness within the 2 GB memory constraint.
  • Figure 3: Android studio log cat is recorded with speed and resource usage. App is tested the custom made dataset and measure the performance.