A two-stream network with global-local feature fusion for bone age assessment

Qiong Lou; Han Yang; Fang Lu

A two-stream network with global-local feature fusion for bone age assessment

Qiong Lou, Han Yang, Fang Lu

TL;DR

BoNet+ tackles the trade-off between global skeletal context and local bone details in bone age assessment by employing a two-stream architecture with a Transformer-based global feature extractor and an RFAConv-based local feature extractor, fused through Inception-V3. The approach achieves state-competitive MAEs on RSNA and RHPE datasets, with ablation studies confirming the complementary benefits of both modules. The method also demonstrates robust performance and interpretable attention shifts via Grad-CAM analyses, suggesting practical potential to reduce clinician workload. Overall, the work advances automated BAA by integrating global and local information in a principled, clinically aligned framework.

Abstract

Bone Age Assessment (BAA) is a widely used clinical technique that can accurately reflect an individual's growth and development level, as well as maturity. In recent years, although deep learning has advanced the field of bone age assessment, existing methods face challenges in efficiently balancing global features and local skeletal details. This study aims to develop an automated bone age assessment system based on a two-stream deep learning architecture to achieve higher accuracy in bone age assessment. We propose the BoNet+ model incorporating global and local feature extraction channels. A Transformer module is introduced into the global feature extraction channel to enhance the ability in extracting global features through multi-head self-attention mechanism. A RFAConv module is incorporated into the local feature extraction channel to generate adaptive attention maps within multiscale receptive fields, enhancing local feature extraction capabilities. Global and local features are concatenated along the channel dimension and optimized by an Inception-V3 network. The proposed method has been validated on the Radiological Society of North America (RSNA) and Radiological Hand Pose Estimation (RHPE) test datasets, achieving mean absolute errors (MAEs) of 3.81 and 5.65 months, respectively. These results are comparable to the state-of-the-art. The BoNet+ model reduces the clinical workload and achieves automatic, high-precision, and more objective bone age assessment.

A two-stream network with global-local feature fusion for bone age assessment

TL;DR

Abstract

A two-stream network with global-local feature fusion for bone age assessment

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (6)