Towards Dialogues for Joint Human-AI Reasoning and Value Alignment

Elfia Bezou-Vrakatseli; Oana Cocarascu; Sanjay Modgil

Towards Dialogues for Joint Human-AI Reasoning and Value Alignment

Elfia Bezou-Vrakatseli, Oana Cocarascu, Sanjay Modgil

TL;DR

This work addresses value alignment in AI by proposing joint human-LLM inquiry dialogues that embed human values into decision making. It grounds the approach in argumentation theory, using $AF$ structures and Dung semantics to derive justified conclusions from collaboratively updated belief bases $\mathcal{B}$, including mechanisms for enthymemes and defeasible reasoning. The paper surveys progress and gaps, highlighting extensions to ASPIC+ for metalevel reasoning, the development of ethical argumentation schemes and meta-schemes, and the need to instantiate these concepts within Large Language Models using advanced prompting strategies (e.g., Chain-of-Thought and maieutic prompting). Overall, it provides a roadmap and key requirements for enabling value-aligned joint reasoning between humans and AI, with practical implications for the safe deployment of LLM-enabled decision support in ethically salient domains.

Abstract

We argue that enabling human-AI dialogue, purposed to support joint reasoning (i.e., 'inquiry'), is important for ensuring that AI decision making is aligned with human values and preferences. In particular, we point to logic-based models of argumentation and dialogue, and suggest that the traditional focus on persuasion dialogues be replaced by a focus on inquiry dialogues, and the distinct challenges that joint inquiry raises. Given recent dramatic advances in the performance of large language models (LLMs), and the anticipated increase in their use for decision making, we provide a roadmap for research into inquiry dialogues for supporting joint human-LLM reasoning tasks that are ethically salient, and that thereby require that decisions are value aligned.

Towards Dialogues for Joint Human-AI Reasoning and Value Alignment

TL;DR

This work addresses value alignment in AI by proposing joint human-LLM inquiry dialogues that embed human values into decision making. It grounds the approach in argumentation theory, using

structures and Dung semantics to derive justified conclusions from collaboratively updated belief bases

, including mechanisms for enthymemes and defeasible reasoning. The paper surveys progress and gaps, highlighting extensions to ASPIC+ for metalevel reasoning, the development of ethical argumentation schemes and meta-schemes, and the need to instantiate these concepts within Large Language Models using advanced prompting strategies (e.g., Chain-of-Thought and maieutic prompting). Overall, it provides a roadmap and key requirements for enabling value-aligned joint reasoning between humans and AI, with practical implications for the safe deployment of LLM-enabled decision support in ethically salient domains.

Abstract

Paper Structure (6 sections)

This paper contains 6 sections.

Introduction
Motivating Research into Joint Inquiry for Value Alignment
Progress Towards Realising Joint Inquiry for Value Alignment
Towards Argumentation-based Dialogues for Value Aligned Inquiry
Towards Large Language Model Interlocutors
Conclusions

Towards Dialogues for Joint Human-AI Reasoning and Value Alignment

TL;DR

Abstract

Towards Dialogues for Joint Human-AI Reasoning and Value Alignment

Authors

TL;DR

Abstract

Table of Contents