Don't Ignore Dual Logic Ability of LLMs while Privatizing: A Data-Intensive Analysis in Medical Domain
Yanrui Du, Sendong Zhao, Muzhen Cai, Ming Ma, Danyang Zhao, Jiawei Cao, Bing Qin
TL;DR
This work introduces the notion of dual logic ability for LLMs and investigates how privatization in the medical domain affects this core reasoning capability. By constructing a data-intensive pipeline that combines medical literature abstracts, multi-round dialogues, and manually annotated dual-logic test data, the authors quantify stance consistency on paired prompts and reveal a substantial degradation in dual logic after privatization. They show that incorporating general-domain dual-logic data during instruction fine-tuning can restore and even improve dual logic alongside accuracy, with pre-training data acting as an enhancing factor. The study also demonstrates transferability of general-domain dual-logic data to other privatized datasets and highlights the influence of the base LLM on downstream dual-logic performance, providing a new benchmark and practical guidance for privatization strategies in real-world medical AI applications.
Abstract
Extensive studies have been devoted to privatizing general-domain Large Language Models (LLMs) as Domain-Specific LLMs via feeding specific-domain data. However, these privatization efforts often ignored a critical aspect: Dual Logic Ability, which is a core reasoning ability for LLMs. The dual logic ability of LLMs ensures that they can maintain a consistent stance when confronted with both positive and negative statements about the same fact. Our study focuses on how the dual logic ability of LLMs is affected during the privatization process in the medical domain. We conduct several experiments to analyze the dual logic ability of LLMs by examining the consistency of the stance in responses to paired questions about the same fact. In our experiments, interestingly, we observed a significant decrease in the dual logic ability of existing LLMs after privatization. Besides, our results indicate that incorporating general domain dual logic data into LLMs not only enhances LLMs' dual logic ability but also further improves their accuracy. These findings underscore the importance of prioritizing LLMs' dual logic ability during the privatization process. Our study establishes a benchmark for future research aimed at exploring LLMs' dual logic ability during the privatization process and offers valuable guidance for privatization efforts in real-world applications.
