First Place Solution of 2023 Global Artificial Intelligence Technology Innovation Competition Track 1
Xiangyu Wu, Hailiang Zhang, Yang Yang, Jianfeng Lu
TL;DR
The paper tackles automatic medical imaging diagnostic report generation under desensitized text constraints by building on CPT-BASE with a span-masked denoising auto-encoding pre-training on an expanded vocabulary expressed as $[C, D, O]$. It then fine-tunes the model with iterative retrieval augmentation to construct a mini-knowledge base from description–diagnosis pairs and employs a noise-aware similarity bucketing strategy to manage noisy retrieved data, complemented by model tricks such as FGM, R-Dropout, EMA, and model ensemble. The key contributions include (i) vocabulary extension and SpanMask-based DAE pre-training, (ii) iterative retrieval augmentation with a retrieval knowledge base and nearest-neighbor retrieval, (iii) a similarity bucketing mechanism with bucket-aware prompts, and (iv) a suite of robustness techniques that yield state-of-the-art CIDEr/BLEU-based scores (e.g., $2.362$ on Leaderboard A and $2.320$ on Leaderboard B with an ensemble). The results demonstrate that retrieval-augmented, noise-aware prompting can substantially improve high-quality diagnostic report generation in desensitized data settings, achieving top standings on the competition leaderboards.
Abstract
In this paper, we present our champion solution to the Global Artificial Intelligence Technology Innovation Competition Track 1: Medical Imaging Diagnosis Report Generation. We select CPT-BASE as our base model for the text generation task. During the pre-training stage, we delete the mask language modeling task of CPT-BASE and instead reconstruct the vocabulary, adopting a span mask strategy and gradually increasing the number of masking ratios to perform the denoising auto-encoder pre-training task. In the fine-tuning stage, we design iterative retrieval augmentation and noise-aware similarity bucket prompt strategies. The retrieval augmentation constructs a mini-knowledge base, enriching the input information of the model, while the similarity bucket further perceives the noise information within the mini-knowledge base, guiding the model to generate higher-quality diagnostic reports based on the similarity prompts. Surprisingly, our single model has achieved a score of 2.321 on leaderboard A, and the multiple model fusion scores are 2.362 and 2.320 on the A and B leaderboards respectively, securing first place in the rankings.
