Resource-Efficient Fine-Tuning of LLaMA-3.2-3B for Medical Chain-of-Thought Reasoning
Imran Mansha
TL;DR
This paper investigates resource-efficient fine-tuning of the LLaMA-3.2-3B Instruct model to enhance medical chain-of-thought reasoning under constrained hardware. By combining Unsloth with QLoRA, it demonstrates a reproducible, low-resource pipeline capable of adapting to medical reasoning datasets while preserving reasoning traces. Although ROUGE-L metrics did not show measurable gains, qualitative analyses indicate improved interpretability and reasoning transparency, with stable training and no catastrophic forgetting. The work provides a practical blueprint for democratizing domain-specific medical AI research on modest hardware and open-source platforms, including release of the training pipeline and models on public hubs. This approach highlights the balance between efficiency and domain specialization, offering a foundation for future, scalable medical AI development.
Abstract
Large Language Models (LLMs) such as GPT-4 and LLaMA have demonstrated remarkable reasoning abilities but require significant computational resources for fine-tuning. This paper presents a resource-efficient fine-tuning approach for LLaMA-3.2-3B to enhance medical chain-of-thought reasoning while operating under constrained GPU and memory settings. Using parameter-efficient tuning techniques such as LoRA and QLoRA, we adapt the base model on publicly available medical reasoning datasets. The model achieves improved reasoning coherence and factual accuracy while reducing memory usage by up to 60% compared to standard full fine-tuning. Experimental evaluation demonstrates that lightweight adaptations can retain strong reasoning capability in medical question-answering tasks. This work highlights practical strategies for deploying LLMs in low-resource research environments and provides insights into balancing efficiency and domain specialization for medical AI systems.
