Table of Contents
Fetching ...

Talking to Machines: do you read me?

Lina M. Rojas-Barahona

TL;DR

This work surveys the evolution of dialogue systems from modular, task-specific architectures to end-to-end neural approaches and large language models, emphasizing Task-Oriented Dialogues (TOD) and Conversational Question Answering (CQA). It presents a coherent program of contributions across NLU, Dialogue Management, and NLG for TOD, including data augmentation, few-shot learning, Bayesian IRL, imitation learning, and graph-based policies. It also extends to conversational QA with ellipsis/coreference detection, question rewriting, and a Wikidata-grounded corpus, alongside advanced KG embeddings in hyperbolic space. The dissertation further documents data collection, annotation practices, and dialogue frameworks (notably PyDial and Dialport), culminating in a scientific project on LLMs for TOD and multimodal dialogue with regard to evaluation, grounding, and decoding control. Overall, the work advances both methodological foundations and practical resources for robust, scalable dialogue systems and foundational QA over knowledge graphs.

Abstract

In this dissertation I would like to guide the reader to the research on dialogue but more precisely the research I have conducted during my career since my PhD thesis. Starting from modular architectures with machine learning/deep learning and reinforcement learning to end-to-end deep neural networks. Besides my work as research associate, I also present the work I have supervised in the last years. I review briefly the state of the art and highlight the open research problems on conversational agents. Afterwards, I present my contribution to Task-Oriented Dialogues (TOD), both as research associate and as the industrial supervisor of CIFRE theses. I discuss conversational QA. Particularly, I present the work of two PhD candidates Thibault Cordier and Sebastien Montella; as well as the work of the young researcher Quentin Brabant. Finally, I present the scientific project, where I discuss about Large Language Models (LLMs) for Task-Oriented Dialogue and Multimodal Task-Oriented Dialogue.

Talking to Machines: do you read me?

TL;DR

This work surveys the evolution of dialogue systems from modular, task-specific architectures to end-to-end neural approaches and large language models, emphasizing Task-Oriented Dialogues (TOD) and Conversational Question Answering (CQA). It presents a coherent program of contributions across NLU, Dialogue Management, and NLG for TOD, including data augmentation, few-shot learning, Bayesian IRL, imitation learning, and graph-based policies. It also extends to conversational QA with ellipsis/coreference detection, question rewriting, and a Wikidata-grounded corpus, alongside advanced KG embeddings in hyperbolic space. The dissertation further documents data collection, annotation practices, and dialogue frameworks (notably PyDial and Dialport), culminating in a scientific project on LLMs for TOD and multimodal dialogue with regard to evaluation, grounding, and decoding control. Overall, the work advances both methodological foundations and practical resources for robust, scalable dialogue systems and foundational QA over knowledge graphs.

Abstract

In this dissertation I would like to guide the reader to the research on dialogue but more precisely the research I have conducted during my career since my PhD thesis. Starting from modular architectures with machine learning/deep learning and reinforcement learning to end-to-end deep neural networks. Besides my work as research associate, I also present the work I have supervised in the last years. I review briefly the state of the art and highlight the open research problems on conversational agents. Afterwards, I present my contribution to Task-Oriented Dialogues (TOD), both as research associate and as the industrial supervisor of CIFRE theses. I discuss conversational QA. Particularly, I present the work of two PhD candidates Thibault Cordier and Sebastien Montella; as well as the work of the young researcher Quentin Brabant. Finally, I present the scientific project, where I discuss about Large Language Models (LLMs) for Task-Oriented Dialogue and Multimodal Task-Oriented Dialogue.
Paper Structure (140 sections, 26 equations, 23 figures, 23 tables)

This paper contains 140 sections, 26 equations, 23 figures, 23 tables.

Figures (23)

  • Figure 1: Basic elements of a statistical spoken dialogue system
  • Figure 2: E2E neural architecture
  • Figure 3: Excerpt from a dialogue in the EmoSpeech corpus. The corresponding user semantics is shown highlighted on the right.
  • Figure 4: Excerpt from a dialogue in the DSTC2 corpus. The top-best ASR hypothesis is shown highlighted on the left, and the corresponding user semantics is shown highlighted on the right.
  • Figure 5: Combination of sentence and context representations for the joint prediction of dialogue acts and slots.
  • ...and 18 more figures