UltraUPConvNet: A UPerNet- and ConvNeXt-Based Multi-Task Network for Ultrasound Tissue Segmentation and Disease Prediction
Zhi Chen, Le Zhang
TL;DR
UltraUPConvNet addresses the need for a low-overhead, universal model that jointly performs ultrasound tissue segmentation and disease prediction. It combines a ConvNeXt-Tiny encoder with a UPerNet-based decoder and four prompt types (nature, position, task, type) via prompt projection embeddings to enable a lightweight, promptable multi-task framework. Training alternates between segmentation and classification batches with task-specific heads and composite losses $L_{seg}$ and $L_{cls}$, balanced by coefficients such as $\lambda_{cls}=10$, achieving state-of-the-art-like performance on multiple datasets while using roughly 30% fewer parameters than comparable SOTA models. The approach demonstrates strong generalization across seven anatomical regions on a dataset with over $9{,}700$ annotations, offering a practical, scalable solution for real-world ultrasound imaging applications.
Abstract
Ultrasound imaging is widely used in clinical practice due to its cost-effectiveness, mobility, and safety. However, current AI research often treats disease prediction and tissue segmentation as two separate tasks and their model requires substantial computational overhead. In such a situation, we introduce UltraUPConvNet, a computationally efficient universal framework designed for both ultrasound image classification and segmentation. Trained on a large-scale dataset containing more than 9,700 annotations across seven different anatomical regions, our model achieves state-of-the-art performance on certain datasets with lower computational overhead. Our model weights and codes are available at https://github.com/yyxl123/UltraUPConvNet
