Table of Contents
Fetching ...

Lightweight Baselines for Medical Abstract Classification: DistilBERT with Cross-Entropy as a Strong Default

Jiaqi Liu, Tong Wang, Su Liu, Xin Hu, Ran Tong, Lanruo Wang, Jiexi Xu

TL;DR

This study evaluates lightweight baselines for medical abstract classification using BERT-base and DistilBERT under fixed budgets, comparing three loss objectives and a post-hoc calibration workflow. DistilBERT with cross-entropy emerges as the strongest raw default, while a tuned deployment using class-wise thresholds and temperature scaling yields the largest macro-balanced gains, particularly for focal loss. The results emphasize that compact encoders offer favorable accuracy-efficiency trade-offs in medical text tasks and that deployment-time calibration can meaningfully boost macro performance without retraining. The work provides practical guidance for on-prem, privacy-conscious healthcare pipelines and outlines avenues for future multimodal and governance-aligned extensions.

Abstract

The research evaluates lightweight medical abstract classification methods to establish their maximum performance capabilities under financial budget restrictions. On the public medical abstracts corpus, we finetune BERT base and Distil BERT with three objectives cross entropy (CE), class weighted CE, and focal loss under identical tokenization, sequence length, optimizer, and schedule. DistilBERT with plain CE gives the strongest raw argmax trade off, while a post hoc operating point selection (validation calibrated, classwise thresholds) sub stantially improves deployed performance; under this tuned regime, focal benefits most. We report Accuracy, Macro F1, and WeightedF1, release evaluation artifacts, and include confusion analyses to clarify error structure. The practical takeaway is to start with a compact encoder and CE, then add lightweight calibration or thresholding when deployment requires higher macro balance.

Lightweight Baselines for Medical Abstract Classification: DistilBERT with Cross-Entropy as a Strong Default

TL;DR

This study evaluates lightweight baselines for medical abstract classification using BERT-base and DistilBERT under fixed budgets, comparing three loss objectives and a post-hoc calibration workflow. DistilBERT with cross-entropy emerges as the strongest raw default, while a tuned deployment using class-wise thresholds and temperature scaling yields the largest macro-balanced gains, particularly for focal loss. The results emphasize that compact encoders offer favorable accuracy-efficiency trade-offs in medical text tasks and that deployment-time calibration can meaningfully boost macro performance without retraining. The work provides practical guidance for on-prem, privacy-conscious healthcare pipelines and outlines avenues for future multimodal and governance-aligned extensions.

Abstract

The research evaluates lightweight medical abstract classification methods to establish their maximum performance capabilities under financial budget restrictions. On the public medical abstracts corpus, we finetune BERT base and Distil BERT with three objectives cross entropy (CE), class weighted CE, and focal loss under identical tokenization, sequence length, optimizer, and schedule. DistilBERT with plain CE gives the strongest raw argmax trade off, while a post hoc operating point selection (validation calibrated, classwise thresholds) sub stantially improves deployed performance; under this tuned regime, focal benefits most. We report Accuracy, Macro F1, and WeightedF1, release evaluation artifacts, and include confusion analyses to clarify error structure. The practical takeaway is to start with a compact encoder and CE, then add lightweight calibration or thresholding when deployment requires higher macro balance.

Paper Structure

This paper contains 17 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Macro-F1 for all settings. Left (gray): six raw configurations (BERT/DistilBERT $\times$ {CE, WCE, FL}). Right (blue): tuned configurations with class-wise thresholding (selected on validation and frozen on test). Tuned DistilBERT+Focal attains the best Macro-F1 (77.55%), tuned CE reaches 70.73%. See Table \ref{['tab:main']} for exact values.
  • Figure 2: Parameter count (millions) vs. Macro-F1. Color encodes loss (CE, Focal, Class Weight); marker shape distinguishes regimes (Raw, Tuned via per-class thresholding on val); fill indicates model (hollow=BERT $\approx$110M, solid=DistilBERT $\approx$66M). Tuned points lift Macro-F1 without changing parameter count.
  • Figure 3: Nine confusion matrices in a 3$\times$3 grid: top—BERT, middle—DistilBERT, bottom—DistilBERT (tuned). Columns: CE / WCE / FL.
  • Figure 4: Per-class precision/recall/F1 across configurations.