Table of Contents
Fetching ...

First Place Solution of 2023 Global Artificial Intelligence Technology Innovation Competition Track 1

Xiangyu Wu, Hailiang Zhang, Yang Yang, Jianfeng Lu

TL;DR

The paper tackles automatic medical imaging diagnostic report generation under desensitized text constraints by building on CPT-BASE with a span-masked denoising auto-encoding pre-training on an expanded vocabulary expressed as $[C, D, O]$. It then fine-tunes the model with iterative retrieval augmentation to construct a mini-knowledge base from description–diagnosis pairs and employs a noise-aware similarity bucketing strategy to manage noisy retrieved data, complemented by model tricks such as FGM, R-Dropout, EMA, and model ensemble. The key contributions include (i) vocabulary extension and SpanMask-based DAE pre-training, (ii) iterative retrieval augmentation with a retrieval knowledge base and nearest-neighbor retrieval, (iii) a similarity bucketing mechanism with bucket-aware prompts, and (iv) a suite of robustness techniques that yield state-of-the-art CIDEr/BLEU-based scores (e.g., $2.362$ on Leaderboard A and $2.320$ on Leaderboard B with an ensemble). The results demonstrate that retrieval-augmented, noise-aware prompting can substantially improve high-quality diagnostic report generation in desensitized data settings, achieving top standings on the competition leaderboards.

Abstract

In this paper, we present our champion solution to the Global Artificial Intelligence Technology Innovation Competition Track 1: Medical Imaging Diagnosis Report Generation. We select CPT-BASE as our base model for the text generation task. During the pre-training stage, we delete the mask language modeling task of CPT-BASE and instead reconstruct the vocabulary, adopting a span mask strategy and gradually increasing the number of masking ratios to perform the denoising auto-encoder pre-training task. In the fine-tuning stage, we design iterative retrieval augmentation and noise-aware similarity bucket prompt strategies. The retrieval augmentation constructs a mini-knowledge base, enriching the input information of the model, while the similarity bucket further perceives the noise information within the mini-knowledge base, guiding the model to generate higher-quality diagnostic reports based on the similarity prompts. Surprisingly, our single model has achieved a score of 2.321 on leaderboard A, and the multiple model fusion scores are 2.362 and 2.320 on the A and B leaderboards respectively, securing first place in the rankings.

First Place Solution of 2023 Global Artificial Intelligence Technology Innovation Competition Track 1

TL;DR

The paper tackles automatic medical imaging diagnostic report generation under desensitized text constraints by building on CPT-BASE with a span-masked denoising auto-encoding pre-training on an expanded vocabulary expressed as . It then fine-tunes the model with iterative retrieval augmentation to construct a mini-knowledge base from description–diagnosis pairs and employs a noise-aware similarity bucketing strategy to manage noisy retrieved data, complemented by model tricks such as FGM, R-Dropout, EMA, and model ensemble. The key contributions include (i) vocabulary extension and SpanMask-based DAE pre-training, (ii) iterative retrieval augmentation with a retrieval knowledge base and nearest-neighbor retrieval, (iii) a similarity bucketing mechanism with bucket-aware prompts, and (iv) a suite of robustness techniques that yield state-of-the-art CIDEr/BLEU-based scores (e.g., on Leaderboard A and on Leaderboard B with an ensemble). The results demonstrate that retrieval-augmented, noise-aware prompting can substantially improve high-quality diagnostic report generation in desensitized data settings, achieving top standings on the competition leaderboards.

Abstract

In this paper, we present our champion solution to the Global Artificial Intelligence Technology Innovation Competition Track 1: Medical Imaging Diagnosis Report Generation. We select CPT-BASE as our base model for the text generation task. During the pre-training stage, we delete the mask language modeling task of CPT-BASE and instead reconstruct the vocabulary, adopting a span mask strategy and gradually increasing the number of masking ratios to perform the denoising auto-encoder pre-training task. In the fine-tuning stage, we design iterative retrieval augmentation and noise-aware similarity bucket prompt strategies. The retrieval augmentation constructs a mini-knowledge base, enriching the input information of the model, while the similarity bucket further perceives the noise information within the mini-knowledge base, guiding the model to generate higher-quality diagnostic reports based on the similarity prompts. Surprisingly, our single model has achieved a score of 2.321 on leaderboard A, and the multiple model fusion scores are 2.362 and 2.320 on the A and B leaderboards respectively, securing first place in the rankings.
Paper Structure (13 sections, 4 figures, 1 table)

This paper contains 13 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: The sample of the dataset. Clinical and Description are denoted as the inputs, while Diagnosis is the output. Note that in the competition, all words have undergone desensitization processing, which means that the text is desensitized at the character level, separated by spaces (e.g., 88 29 17 55 72).
  • Figure 2: The sample of the dataset. Clinical and Description are denoted as the inputs, while Diagnosis is the output. Note that in the competition, all words have undergone desensitization processing, which means that the text is desensitized at the character level, separated by spaces (e.g., 88 29 17 55 72).
  • Figure 3: The strategy of Retrieval Augmentation.
  • Figure 4: The strategy of Noise-aware Similarity bucketing Prompt.