Can LLMs faithfully generate their layperson-understandable 'self'?: A Case Study in High-Stakes Domains
Arion Das, Asutosh Mishra, Amitesh Patel, Soumilya De, V. Gurucharan, Kripabandhu Ghosh
TL;DR
This paper introduces ReQuesting, a prompt-based regime that produces layperson-friendly, algorithmic representations of LLM working to enhance explainability in law, health, and finance. By coupling Task Prompts with ReQuest Prompts and Robustness Check Prompts, the authors extract a ReQuest Algorithm whose faithfulness is evaluated via novel reproducibility metrics (PerRR and PreRR) across intra- and inter-LLM setups. Empirical results in statute prediction and human rights violation prediction show high reproducibility, particularly within single models, while finance tasks reveal strong intra-LLM reproducibility and meaningful cross-model transfer. Health-domain results indicate strong performance for certain tasks but also highlight variability and domain dependence, prompting a discussion on intrinsic reasoning alignment and future research directions to further validate the approach and extend its applicability.
Abstract
Large Language Models (LLMs) have significantly impacted nearly every domain of human knowledge. However, the explainability of these models esp. to laypersons, which are crucial for instilling trust, have been examined through various skeptical lenses. In this paper, we introduce a novel notion of LLM explainability to laypersons, termed $\textit{ReQuesting}$, across three high-priority application domains -- law, health and finance, using multiple state-of-the-art LLMs. The proposed notion exhibits faithful generation of explainable layman-understandable algorithms on multiple tasks through high degree of reproducibility. Furthermore, we observe a notable alignment of the explainable algorithms with intrinsic reasoning of the LLMs.
