Table of Contents
Fetching ...

Resource-Efficient Fine-Tuning of LLaMA-3.2-3B for Medical Chain-of-Thought Reasoning

Imran Mansha

TL;DR

This paper investigates resource-efficient fine-tuning of the LLaMA-3.2-3B Instruct model to enhance medical chain-of-thought reasoning under constrained hardware. By combining Unsloth with QLoRA, it demonstrates a reproducible, low-resource pipeline capable of adapting to medical reasoning datasets while preserving reasoning traces. Although ROUGE-L metrics did not show measurable gains, qualitative analyses indicate improved interpretability and reasoning transparency, with stable training and no catastrophic forgetting. The work provides a practical blueprint for democratizing domain-specific medical AI research on modest hardware and open-source platforms, including release of the training pipeline and models on public hubs. This approach highlights the balance between efficiency and domain specialization, offering a foundation for future, scalable medical AI development.

Abstract

Large Language Models (LLMs) such as GPT-4 and LLaMA have demonstrated remarkable reasoning abilities but require significant computational resources for fine-tuning. This paper presents a resource-efficient fine-tuning approach for LLaMA-3.2-3B to enhance medical chain-of-thought reasoning while operating under constrained GPU and memory settings. Using parameter-efficient tuning techniques such as LoRA and QLoRA, we adapt the base model on publicly available medical reasoning datasets. The model achieves improved reasoning coherence and factual accuracy while reducing memory usage by up to 60% compared to standard full fine-tuning. Experimental evaluation demonstrates that lightweight adaptations can retain strong reasoning capability in medical question-answering tasks. This work highlights practical strategies for deploying LLMs in low-resource research environments and provides insights into balancing efficiency and domain specialization for medical AI systems.

Resource-Efficient Fine-Tuning of LLaMA-3.2-3B for Medical Chain-of-Thought Reasoning

TL;DR

This paper investigates resource-efficient fine-tuning of the LLaMA-3.2-3B Instruct model to enhance medical chain-of-thought reasoning under constrained hardware. By combining Unsloth with QLoRA, it demonstrates a reproducible, low-resource pipeline capable of adapting to medical reasoning datasets while preserving reasoning traces. Although ROUGE-L metrics did not show measurable gains, qualitative analyses indicate improved interpretability and reasoning transparency, with stable training and no catastrophic forgetting. The work provides a practical blueprint for democratizing domain-specific medical AI research on modest hardware and open-source platforms, including release of the training pipeline and models on public hubs. This approach highlights the balance between efficiency and domain specialization, offering a foundation for future, scalable medical AI development.

Abstract

Large Language Models (LLMs) such as GPT-4 and LLaMA have demonstrated remarkable reasoning abilities but require significant computational resources for fine-tuning. This paper presents a resource-efficient fine-tuning approach for LLaMA-3.2-3B to enhance medical chain-of-thought reasoning while operating under constrained GPU and memory settings. Using parameter-efficient tuning techniques such as LoRA and QLoRA, we adapt the base model on publicly available medical reasoning datasets. The model achieves improved reasoning coherence and factual accuracy while reducing memory usage by up to 60% compared to standard full fine-tuning. Experimental evaluation demonstrates that lightweight adaptations can retain strong reasoning capability in medical question-answering tasks. This work highlights practical strategies for deploying LLMs in low-resource research environments and provides insights into balancing efficiency and domain specialization for medical AI systems.

Paper Structure

This paper contains 20 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Overview of training dynamics across multiple fine-tuning runs. Curves show training loss, learning rate schedule, gradient norm, global steps, and epochs. The loss decreased steadily while gradient norms remained stable, suggesting convergence without instability.
  • Figure 2: Detailed view of a single fine-tuning run. The training loss steadily declined, the learning rate followed a cosine decay schedule, and the gradient norm stabilized quickly. These results confirm that QLoRA fine-tuning preserved baseline model stability under resource-constrained training.