Active Prompt Tuning Enables Gpt-40 To Do Efficient Classification Of Microscopy Images
Abhiram Kandiyana, Peter R. Mouton, Yaroslav Kolinko, Lawrence O. Hall, Dmitry Goldgof
TL;DR
The paper tackles the bottleneck of ground-truth annotation in microscopy image classification by employing Active Prompt Tuning with GPT-4o, leveraging few-shot prompts and an iterative human-in-the-loop to build an effective prompt set without model fine-tuning. It demonstrates strong generalization across datasets with different brain regions and magnifications, achieving about 92% accuracy while reducing ground-truth annotation effort by over 90% compared with a CNN baseline. The approach also produces interpretable explanations for each classified image, facilitating trust and potential use as ground-truth for training other vision-language models. Overall, the method offers a scalable, efficient alternative to traditional CNN-based histology classification, with practical implications for neuroscience research and AI-assisted microscopy workflows.
Abstract
Traditional deep learning-based methods for classifying cellular features in microscopy images require time- and labor-intensive processes for training models. Among the current limitations are major time commitments from domain experts for accurate ground truth preparation; and the need for a large amount of input image data. We previously proposed a solution that overcomes these challenges using OpenAI's GPT-4(V) model on a pilot dataset (Iba-1 immuno-stained tissue sections from 11 mouse brains). Results on the pilot dataset were equivalent in accuracy and with a substantial improvement in throughput efficiency compared to the baseline using a traditional Convolutional Neural Net (CNN)-based approach. The present study builds upon this framework using a second unique and substantially larger dataset of microscopy images. Our current approach uses a newer and faster model, GPT-4o, along with improved prompts. It was evaluated on a microscopy image dataset captured at low (10x) magnification from cresyl-violet-stained sections through the cerebellum of a total of 18 mouse brains (9 Lurcher mice, 9 wild-type controls). We used our approach to classify these images either as a control group or Lurcher mutant. Using 6 mice in the prompt set the results were correct classification for 11 out of the 12 mice (92%) with 96% higher efficiency, reduced image requirements, and lower demands on time and effort of domain experts compared to the baseline method (snapshot ensemble of CNN models). These results confirm that our approach is effective across multiple datasets from different brain regions and magnifications, with minimal overhead.
