Table of Contents
Fetching ...

Effective and Efficient Schema-aware Information Extraction Using On-Device Large Language Models

Zhihao Wen, Sheng Liang, Yaxiong Wu, Yongyue Zhang, Yong Liu

TL;DR

This work tackles information extraction on resource-constrained edge devices by introducing DLISC, a two-stage on-device IE framework. DLISC separates schema identification and schema-aware extraction using Identification and Extraction LoRA modules, augmented by Incremental Schema Caching to reduce redundant computation. Across on-device LLMs and IE tasks, DLISC improves both accuracy (F1) and responsiveness (latency) relative to retrieval-augmented baselines, demonstrating strong effectiveness in schema-rich, cross-domain contexts. The approach enables scalable, fast, and accurate IE on edge devices and sets the stage for broader schema coverage and multilingual capabilities in future work.

Abstract

Information extraction (IE) plays a crucial role in natural language processing (NLP) by converting unstructured text into structured knowledge. Deploying computationally intensive large language models (LLMs) on resource-constrained devices for information extraction is challenging, particularly due to issues like hallucinations, limited context length, and high latency-especially when handling diverse extraction schemas. To address these challenges, we propose a two-stage information extraction approach adapted for on-device LLMs, called Dual-LoRA with Incremental Schema Caching (DLISC), which enhances both schema identification and schema-aware extraction in terms of effectiveness and efficiency. In particular, DLISC adopts an Identification LoRA module for retrieving the most relevant schemas to a given query, and an Extraction LoRA module for performing information extraction based on the previously selected schemas. To accelerate extraction inference, Incremental Schema Caching is incorporated to reduce redundant computation, substantially improving efficiency. Extensive experiments across multiple information extraction datasets demonstrate notable improvements in both effectiveness and efficiency.

Effective and Efficient Schema-aware Information Extraction Using On-Device Large Language Models

TL;DR

This work tackles information extraction on resource-constrained edge devices by introducing DLISC, a two-stage on-device IE framework. DLISC separates schema identification and schema-aware extraction using Identification and Extraction LoRA modules, augmented by Incremental Schema Caching to reduce redundant computation. Across on-device LLMs and IE tasks, DLISC improves both accuracy (F1) and responsiveness (latency) relative to retrieval-augmented baselines, demonstrating strong effectiveness in schema-rich, cross-domain contexts. The approach enables scalable, fast, and accurate IE on edge devices and sets the stage for broader schema coverage and multilingual capabilities in future work.

Abstract

Information extraction (IE) plays a crucial role in natural language processing (NLP) by converting unstructured text into structured knowledge. Deploying computationally intensive large language models (LLMs) on resource-constrained devices for information extraction is challenging, particularly due to issues like hallucinations, limited context length, and high latency-especially when handling diverse extraction schemas. To address these challenges, we propose a two-stage information extraction approach adapted for on-device LLMs, called Dual-LoRA with Incremental Schema Caching (DLISC), which enhances both schema identification and schema-aware extraction in terms of effectiveness and efficiency. In particular, DLISC adopts an Identification LoRA module for retrieving the most relevant schemas to a given query, and an Extraction LoRA module for performing information extraction based on the previously selected schemas. To accelerate extraction inference, Incremental Schema Caching is incorporated to reduce redundant computation, substantially improving efficiency. Extensive experiments across multiple information extraction datasets demonstrate notable improvements in both effectiveness and efficiency.

Paper Structure

This paper contains 14 sections, 4 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: The LLM-Adapters architecture for deploying LLMs on edge devices with a single on-device LLM and multiple plug-in LoRA modules.
  • Figure 2: An illustrative comparison of (a) RAG-based IE with schema retrieval and top-K schema-aware extraction; (b) Dual-LoRA IE paradigm with schema identification and schema-aware extraction; (c) Dual-LoRA with Incremental Schema Caching (DLISC) for further enhancing inference efficiency.