Table of Contents
Fetching ...

Unveiling the Impact of Multimodal Features on Chinese Spelling Correction: From Analysis to Design

Xiaowu Zhang, Hongfei Zhao, Jingyi Hou, Zhijie Liu

TL;DR

This paper tackles Chinese Spelling Correction by investigating how phonetic and graphemic features can be effectively utilized. It introduces MACU, a framework to quantify multimodal usage, and NamBert, a non-aligned multimodal BERT with dedicated phonetic and graphemic encoders plus a semantic fusion pathway guided by a forget gate and a Focal Loss objective. Empirical results on SIGHAN and CSCD-NS show NamBert surpassing existing multimodal methods and reveal trade-offs between traditional multimodal models and LLMs, including speed and over-correction. The findings underscore the value of preserving rich multimodal information and suggest a hybrid approach that leverages LLM strengths alongside robust multimodal cues for robust, scalable CSC.

Abstract

The Chinese Spelling Correction (CSC) task focuses on detecting and correcting spelling errors in sentences. Current research primarily explores two approaches: traditional multimodal pre-trained models and large language models (LLMs). However, LLMs face limitations in CSC, particularly over-correction, making them suboptimal for this task. While existing studies have investigated the use of phonetic and graphemic information in multimodal CSC models, effectively leveraging these features to enhance correction performance remains a challenge. To address this, we propose the Multimodal Analysis for Character Usage (\textbf{MACU}) experiment, identifying potential improvements for multimodal correctison. Based on empirical findings, we introduce \textbf{NamBert}, a novel multimodal model for Chinese spelling correction. Experiments on benchmark datasets demonstrate NamBert's superiority over SOTA methods. We also conduct a comprehensive comparison between NamBert and LLMs, systematically evaluating their strengths and limitations in CSC. Our code and model are available at https://github.com/iioSnail/NamBert.

Unveiling the Impact of Multimodal Features on Chinese Spelling Correction: From Analysis to Design

TL;DR

This paper tackles Chinese Spelling Correction by investigating how phonetic and graphemic features can be effectively utilized. It introduces MACU, a framework to quantify multimodal usage, and NamBert, a non-aligned multimodal BERT with dedicated phonetic and graphemic encoders plus a semantic fusion pathway guided by a forget gate and a Focal Loss objective. Empirical results on SIGHAN and CSCD-NS show NamBert surpassing existing multimodal methods and reveal trade-offs between traditional multimodal models and LLMs, including speed and over-correction. The findings underscore the value of preserving rich multimodal information and suggest a hybrid approach that leverages LLM strengths alongside robust multimodal cues for robust, scalable CSC.

Abstract

The Chinese Spelling Correction (CSC) task focuses on detecting and correcting spelling errors in sentences. Current research primarily explores two approaches: traditional multimodal pre-trained models and large language models (LLMs). However, LLMs face limitations in CSC, particularly over-correction, making them suboptimal for this task. While existing studies have investigated the use of phonetic and graphemic information in multimodal CSC models, effectively leveraging these features to enhance correction performance remains a challenge. To address this, we propose the Multimodal Analysis for Character Usage (\textbf{MACU}) experiment, identifying potential improvements for multimodal correctison. Based on empirical findings, we introduce \textbf{NamBert}, a novel multimodal model for Chinese spelling correction. Experiments on benchmark datasets demonstrate NamBert's superiority over SOTA methods. We also conduct a comprehensive comparison between NamBert and LLMs, systematically evaluating their strengths and limitations in CSC. Our code and model are available at https://github.com/iioSnail/NamBert.

Paper Structure

This paper contains 11 sections, 12 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Examples of Chinese spelling errors. Mis-spelling characters are marked in red, while the correct characters are marked in blue, with the corresponding phonics provided in brackets.
  • Figure 2: The encoder extracts Chinese characters' phonetic and graphical features separately and constructs a confusion set. Then, characters in the Chinese text are selected and replaced, and the model’s prediction accuracy for these characters is calculated. The figure shows the character replacement process based on phonetic and graphical similarity to test the model’s performance within different similarity ranges.
  • Figure 3: The figure shows the results of the probe experiment and MACU experiment. The bar chart represents the results of the probe experiment, where "Similarity" denotes the lower bound of the similarity range. The line chart shows the accuracy of the MACU experiment at that similarity level.
  • Figure 4: The architecture of NamBert. Multimodal information is extracted through a redesigned phonetic encoder, graphemic encoder, and semantic encoder. Modal information is used using a non-aligned posterior fusion approach, which is linearly transformed into 768 dimensions through a linear layer. The output layer fixes index 1 for the correct characters, while for incorrect characters, it outputs the corresponding index in the dictionary. Focal Loss is used to reduce the weight of index 1 so that the training focuses more on incorrect characters.