Talking to Machines: do you read me?

Lina M. Rojas-Barahona

Talking to Machines: do you read me?

Lina M. Rojas-Barahona

TL;DR

This work surveys the evolution of dialogue systems from modular, task-specific architectures to end-to-end neural approaches and large language models, emphasizing Task-Oriented Dialogues (TOD) and Conversational Question Answering (CQA). It presents a coherent program of contributions across NLU, Dialogue Management, and NLG for TOD, including data augmentation, few-shot learning, Bayesian IRL, imitation learning, and graph-based policies. It also extends to conversational QA with ellipsis/coreference detection, question rewriting, and a Wikidata-grounded corpus, alongside advanced KG embeddings in hyperbolic space. The dissertation further documents data collection, annotation practices, and dialogue frameworks (notably PyDial and Dialport), culminating in a scientific project on LLMs for TOD and multimodal dialogue with regard to evaluation, grounding, and decoding control. Overall, the work advances both methodological foundations and practical resources for robust, scalable dialogue systems and foundational QA over knowledge graphs.

Abstract

In this dissertation I would like to guide the reader to the research on dialogue but more precisely the research I have conducted during my career since my PhD thesis. Starting from modular architectures with machine learning/deep learning and reinforcement learning to end-to-end deep neural networks. Besides my work as research associate, I also present the work I have supervised in the last years. I review briefly the state of the art and highlight the open research problems on conversational agents. Afterwards, I present my contribution to Task-Oriented Dialogues (TOD), both as research associate and as the industrial supervisor of CIFRE theses. I discuss conversational QA. Particularly, I present the work of two PhD candidates Thibault Cordier and Sebastien Montella; as well as the work of the young researcher Quentin Brabant. Finally, I present the scientific project, where I discuss about Large Language Models (LLMs) for Task-Oriented Dialogue and Multimodal Task-Oriented Dialogue.

Talking to Machines: do you read me?

TL;DR

Abstract

Paper Structure (140 sections, 26 equations, 23 figures, 23 tables)

This paper contains 140 sections, 26 equations, 23 figures, 23 tables.

Introduction
A Glance to the Research on Dialogue
Why is human conversation difficult?
Dialogue-Acts and discourse obligations:
Coreferences and Ambiguity:
Grounding:
Planning:
Preliminary Approaches
Task Oriented Dialogue Systems
Definitions
Statistical Dialogue Systems
Markov decision process
Partially Observable Markov decision process
Hierarchical Reinforcement Learning
Reward Functions for Dialogue Systems
...and 125 more sections

Figures (23)

Figure 1: Basic elements of a statistical spoken dialogue system
Figure 2: E2E neural architecture
Figure 3: Excerpt from a dialogue in the EmoSpeech corpus. The corresponding user semantics is shown highlighted on the right.
Figure 4: Excerpt from a dialogue in the DSTC2 corpus. The top-best ASR hypothesis is shown highlighted on the left, and the corresponding user semantics is shown highlighted on the right.
Figure 5: Combination of sentence and context representations for the joint prediction of dialogue acts and slots.
...and 18 more figures

Talking to Machines: do you read me?

TL;DR

Abstract

Talking to Machines: do you read me?

Authors

TL;DR

Abstract

Table of Contents

Figures (23)