Table of Contents
Fetching ...

Handling and extracting key entities from customer conversations using Speech recognition and Named Entity recognition

Sharvi Endait, Ruturaj Ghatage, DD Kadam

TL;DR

The paper addresses extracting key entities from customer conversations by combining speech recognition and named entity recognition. It proposes a two-stage pipeline using Wav2Vec 2.0 for ASR followed by a fine-tuned BERT-based NER to extract structured details like order numbers and issues. The literature survey covers Deep Speech, LAS, and Wav2Vec for ASR, and ontology-based, deep learning, and BERT-based NER, highlighting architectures and training strategies. The work discusses a practical, transformer-based pipeline, acknowledges data and domain challenges, and outlines future directions toward an end-to-end API and domain-specific datasets to enhance customer-service analytics.

Abstract

In this modern era of technology with e-commerce developing at a rapid pace, it is very important to understand customer requirements and details from a business conversation. It is very crucial for customer retention and satisfaction. Extracting key insights from these conversations is very important when it comes to developing their product or solving their issue. Understanding customer feedback, responses, and important details of the product are essential and it would be done using Named entity recognition (NER). For extracting the entities we would be converting the conversations to text using the optimal speech-to-text model. The model would be a two-stage network in which the conversation is converted to text. Then, suitable entities are extracted using robust techniques using a NER BERT transformer model. This will aid in the enrichment of customer experience when there is an issue which is faced by them. If a customer faces a problem he will call and register his complaint. The model will then extract the key features from this conversation which will be necessary to look into the problem. These features would include details like the order number, and the exact problem. All these would be extracted directly from the conversation and this would reduce the effort of going through the conversation again.

Handling and extracting key entities from customer conversations using Speech recognition and Named Entity recognition

TL;DR

The paper addresses extracting key entities from customer conversations by combining speech recognition and named entity recognition. It proposes a two-stage pipeline using Wav2Vec 2.0 for ASR followed by a fine-tuned BERT-based NER to extract structured details like order numbers and issues. The literature survey covers Deep Speech, LAS, and Wav2Vec for ASR, and ontology-based, deep learning, and BERT-based NER, highlighting architectures and training strategies. The work discusses a practical, transformer-based pipeline, acknowledges data and domain challenges, and outlines future directions toward an end-to-end API and domain-specific datasets to enhance customer-service analytics.

Abstract

In this modern era of technology with e-commerce developing at a rapid pace, it is very important to understand customer requirements and details from a business conversation. It is very crucial for customer retention and satisfaction. Extracting key insights from these conversations is very important when it comes to developing their product or solving their issue. Understanding customer feedback, responses, and important details of the product are essential and it would be done using Named entity recognition (NER). For extracting the entities we would be converting the conversations to text using the optimal speech-to-text model. The model would be a two-stage network in which the conversation is converted to text. Then, suitable entities are extracted using robust techniques using a NER BERT transformer model. This will aid in the enrichment of customer experience when there is an issue which is faced by them. If a customer faces a problem he will call and register his complaint. The model will then extract the key features from this conversation which will be necessary to look into the problem. These features would include details like the order number, and the exact problem. All these would be extracted directly from the conversation and this would reduce the effort of going through the conversation again.
Paper Structure (19 sections, 2 figures)