Table of Contents
Fetching ...

Aspect-oriented Consumer Health Answer Summarization

Rochana Chaturvedi, Abari Bhattacharya, Shweta Yadav

TL;DR

The paper tackles the problem that health-related Community Question Answering (CQA) forums often produce multiple answers for a single query, making it hard to grasp the key information. It introduces CHA-Summ, a dataset with sentence-level relevance and four health-focused aspects (Information, Question, Suggestion, Experience) and presents a multi-stage pipeline that retrieves relevant sentences, classifies their aspect, and generates per-aspect abstractive summaries using state-of-the-art transformers. Experimental results show that aspect-based summaries, especially when produced via a pipeline that carefully selects relevant content and leverages strong abstractive models, achieve better content coverage and relevance than single-answer baselines, with human evaluations endorsing the approach. The work provides valuable tooling for making health information on public forums more usable and suggests avenues for improving factuality and scalability through larger silver datasets and downstream QA tasks.

Abstract

Community Question-Answering (CQA) forums have revolutionized how people seek information, especially those related to their healthcare needs, placing their trust in the collective wisdom of the public. However, there can be several answers in response to a single query, which makes it hard to grasp the key information related to the specific health concern. Typically, CQA forums feature a single top-voted answer as a representative summary for each query. However, a single answer overlooks the alternative solutions and other information frequently offered in other responses. Our research focuses on aspect-based summarization of health answers to address this limitation. Summarization of responses under different aspects such as suggestions, information, personal experiences, and questions can enhance the usability of the platforms. We formalize a multi-stage annotation guideline and contribute a unique dataset comprising aspect-based human-written health answer summaries. We build an automated multi-faceted answer summarization pipeline with this dataset based on task-specific fine-tuning of several state-of-the-art models. The pipeline leverages question similarity to retrieve relevant answer sentences, subsequently classifying them into the appropriate aspect type. Following this, we employ several recent abstractive summarization models to generate aspect-based summaries. Finally, we present a comprehensive human analysis and find that our summaries rank high in capturing relevant content and a wide range of solutions.

Aspect-oriented Consumer Health Answer Summarization

TL;DR

The paper tackles the problem that health-related Community Question Answering (CQA) forums often produce multiple answers for a single query, making it hard to grasp the key information. It introduces CHA-Summ, a dataset with sentence-level relevance and four health-focused aspects (Information, Question, Suggestion, Experience) and presents a multi-stage pipeline that retrieves relevant sentences, classifies their aspect, and generates per-aspect abstractive summaries using state-of-the-art transformers. Experimental results show that aspect-based summaries, especially when produced via a pipeline that carefully selects relevant content and leverages strong abstractive models, achieve better content coverage and relevance than single-answer baselines, with human evaluations endorsing the approach. The work provides valuable tooling for making health information on public forums more usable and suggests avenues for improving factuality and scalability through larger silver datasets and downstream QA tasks.

Abstract

Community Question-Answering (CQA) forums have revolutionized how people seek information, especially those related to their healthcare needs, placing their trust in the collective wisdom of the public. However, there can be several answers in response to a single query, which makes it hard to grasp the key information related to the specific health concern. Typically, CQA forums feature a single top-voted answer as a representative summary for each query. However, a single answer overlooks the alternative solutions and other information frequently offered in other responses. Our research focuses on aspect-based summarization of health answers to address this limitation. Summarization of responses under different aspects such as suggestions, information, personal experiences, and questions can enhance the usability of the platforms. We formalize a multi-stage annotation guideline and contribute a unique dataset comprising aspect-based human-written health answer summaries. We build an automated multi-faceted answer summarization pipeline with this dataset based on task-specific fine-tuning of several state-of-the-art models. The pipeline leverages question similarity to retrieve relevant answer sentences, subsequently classifying them into the appropriate aspect type. Following this, we employ several recent abstractive summarization models to generate aspect-based summaries. Finally, we present a comprehensive human analysis and find that our summaries rank high in capturing relevant content and a wide range of solutions.
Paper Structure (37 sections, 7 figures, 9 tables)

This paper contains 37 sections, 7 figures, 9 tables.

Figures (7)

  • Figure 1: Example illustrating a health query, responses, and their aspect-based summaries.
  • Figure 2: Distribution of word-count ratio. The top panel shows the compression over each annotation step. The Bottom Panel shows the reduction for each aspect category in the final summaries.
  • Figure 3: Confusion Matrix for end-to-end aspect identification using RoBERTa-ft$_r$ for selecting relevant sentences, RoBERTa-ft$_f$ for aspect classification. Suggestion, Experience, Information, Question), NA (irrelevant).
  • Figure 4: Example summaries for Suggestion and Information generated by Ans + ft BART and Pipeline+ft BART models along with the best answer summary highlighted in red.
  • Figure 5: Distribution of categories
  • ...and 2 more figures