Question Rephrasing for Quantifying Uncertainty in Large Language Models: Applications in Molecular Chemistry Tasks
Zizhang Chen, Pengyu Hong, Sandeep Madireddy
TL;DR
This work addresses the problem of assessing reliability in large language models applied to chemistry by combining Question Rephrasing to probe input uncertainty with sampling-based evaluation of output uncertainty. It formalizes two uncertainty channels, using SMILES variant perturbations and entropy-oriented metrics, including an explicit structure-based clustering approach for generation tasks. Experiments with GPT-4 and GPT-3.5 across molecular property and forward reaction prediction demonstrate that input variations can affect predictions and that entropy-based scores effectively indicate when model outputs are trustworthy, even when raw accuracy is low. The findings underscore the need to enhance foundational chemistry understanding in LLMs to enable more reliable and transparent AI for chemical informatics.
Abstract
Uncertainty quantification enables users to assess the reliability of responses generated by large language models (LLMs). We present a novel Question Rephrasing technique to evaluate the input uncertainty of LLMs, which refers to the uncertainty arising from equivalent variations of the inputs provided to LLMs. This technique is integrated with sampling methods that measure the output uncertainty of LLMs, thereby offering a more comprehensive uncertainty assessment. We validated our approach on property prediction and reaction prediction for molecular chemistry tasks.
