Table of Contents
Fetching ...

The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents

Xing Han Lu, Siva Reddy, Harm de Vries

TL;DR

This paper introduces the StatCan Dialogue Dataset, a large bilingual corpus of live chat conversations between Statistics Canada visitors and agents, focused on retrieving data tables. It defines two practical tasks—table retrieval and response generation—and establishes strong baselines (BM25, DPR, TAPAS, T5) evaluated with multiple metrics across English and French, including a temporally stratified split to stress generalization. The results reveal significant temporal drift and challenges for both retrieval and generation, with retrieval generally outperforming generation and models struggling to decide when to return a table. The work provides a dataset card, discusses limitations and societal considerations, and offers a platform for developing robust, real-world conversational systems that assist knowledge workers in accessing statistical tables.

Abstract

We introduce the StatCan Dialogue Dataset consisting of 19,379 conversation turns between agents working at Statistics Canada and online users looking for published data tables. The conversations stem from genuine intents, are held in English or French, and lead to agents retrieving one of over 5000 complex data tables. Based on this dataset, we propose two tasks: (1) automatic retrieval of relevant tables based on a on-going conversation, and (2) automatic generation of appropriate agent responses at each turn. We investigate the difficulty of each task by establishing strong baselines. Our experiments on a temporal data split reveal that all models struggle to generalize to future conversations, as we observe a significant drop in performance across both tasks when we move from the validation to the test set. In addition, we find that response generation models struggle to decide when to return a table. Considering that the tasks pose significant challenges to existing models, we encourage the community to develop models for our task, which can be directly used to help knowledge workers find relevant tables for live chat users.

The StatCan Dialogue Dataset: Retrieving Data Tables through Conversations with Genuine Intents

TL;DR

This paper introduces the StatCan Dialogue Dataset, a large bilingual corpus of live chat conversations between Statistics Canada visitors and agents, focused on retrieving data tables. It defines two practical tasks—table retrieval and response generation—and establishes strong baselines (BM25, DPR, TAPAS, T5) evaluated with multiple metrics across English and French, including a temporally stratified split to stress generalization. The results reveal significant temporal drift and challenges for both retrieval and generation, with retrieval generally outperforming generation and models struggling to decide when to return a table. The work provides a dataset card, discusses limitations and societal considerations, and offers a platform for developing robust, real-world conversational systems that assist knowledge workers in accessing statistical tables.

Abstract

We introduce the StatCan Dialogue Dataset consisting of 19,379 conversation turns between agents working at Statistics Canada and online users looking for published data tables. The conversations stem from genuine intents, are held in English or French, and lead to agents retrieving one of over 5000 complex data tables. Based on this dataset, we propose two tasks: (1) automatic retrieval of relevant tables based on a on-going conversation, and (2) automatic generation of appropriate agent responses at each turn. We investigate the difficulty of each task by establishing strong baselines. Our experiments on a temporal data split reveal that all models struggle to generalize to future conversations, as we observe a significant drop in performance across both tasks when we move from the validation to the test set. In addition, we find that response generation models struggle to decide when to return a table. Considering that the tasks pose significant challenges to existing models, we encourage the community to develop models for our task, which can be directly used to help knowledge workers find relevant tables for live chat users.
Paper Structure (120 sections, 8 figures, 33 tables)

This paper contains 120 sections, 8 figures, 33 tables.

Figures (8)

  • Figure 1: Histogram of messages by conversation in the both tasks (French split in \ref{['fig:turn_msg_histogram_fr']}).
  • Figure 2: Diagram of the hierarchical relationship between metadata components, discussed in \ref{['sec:table_specs']}.
  • Figure 3: Results for DPR B+M for tables appearing in the training set frequently (10+ times), infrequently (1-10 times), and unseen at train time.
  • Figure 4: Validation and test recall for a selected set of retrieval models. We observe a significant drop in performance. B+M denotes Basic + member.
  • Figure 5: The release year of all tables available on statcan.gc.ca
  • ...and 3 more figures