Self-Correction Distillation for Structured Data Question Answering
Yushan Zhu, Wen Zhang, Long Jin, Mengshu Sun, Ling Zhong, Zhiqiang Liu, Juan Li, Lei Liang, Chong Long, Chao Deng, Junlan Feng
TL;DR
This paper tackles the challenge of enabling small-scale LLMs to perform robust structured data QA by introducing Self-Correction Distillation (SCD). SCD combines an Error Prompt Mechanism (EPM) that provides iterative, type-specific feedback during inference with a two-stage distillation pipeline (teacher-distillation followed by self-distillation) to transfer and refine query-generation and error-correction skills. Empirical results across five benchmarks spanning table QA, KG QA, and temporal KG QA show that an 8B LLM trained with SCD outperforms existing distillation methods and even approaches GPT-4 on some datasets, while large LLMs with EPM achieve SOTA on most tasks. The work demonstrates practical potential for privately deployed, small-scale models in unified structured QA and highlights the value of error-aware feedback loops for model improvement.
Abstract
Structured data question answering (QA), including table QA, Knowledge Graph (KG) QA, and temporal KG QA, is a pivotal research area. Advances in large language models (LLMs) have driven significant progress in unified structural QA frameworks like TrustUQA. However, these frameworks face challenges when applied to small-scale LLMs since small-scale LLMs are prone to errors in generating structured queries. To improve the structured data QA ability of small-scale LLMs, we propose a self-correction distillation (SCD) method. In SCD, an error prompt mechanism (EPM) is designed to detect errors and provide customized error messages during inference, and a two-stage distillation strategy is designed to transfer large-scale LLMs' query-generation and error-correction capabilities to small-scale LLM. Experiments across 5 benchmarks with 3 structured data types demonstrate that our SCD achieves the best performance and superior generalization on small-scale LLM (8B) compared to other distillation methods, and closely approaches the performance of GPT4 on some datasets. Furthermore, large-scale LLMs equipped with EPM surpass the state-of-the-art results on most datasets.
