Adapting SAM with Dynamic Similarity Graphs for Few-Shot Parameter-Efficient Small Dense Object Detection: A Case Study of Chickpea Pods in Field Conditions
Xintong Jiang, Yixue Liu, Mohamed Debbagh, Yu Tian, Valerio Hoyos-Villegas, Viacheslav Adamchuk, Shangpeng Sun
TL;DR
This work tackles the challenge of few-shot, pixel-level segmentation of small dense agricultural organs under complex field conditions. It introduces Dynamic Similarity-based Graph Adaptation (DSGA) combined with Low-Rank Adaptation (LoRA) to adapt Segment Anything Model (SAM) for both foreground and instance segmentation with minimal data. DSGA builds a dynamic adjacency graph with learnable rank-weighted neighbors and adaptive local pooling to capture global and local dependencies, while LoRA tunes only the query and value projections, yielding a parameter-efficient framework (roughly 4.6% of SAM) that is fired in a two-stage process with adaptive prompt generation and a composite loss. Empirical results on a chickpea pod dataset show superior performance on both foreground and instance segmentation across 2–10 shots, along with interpretable visualizations (Grad-CAM, t-SNE) and strong field-counting accuracy (adjusted R^2 ≈ 0.899), highlighting practical applicability for automated agricultural monitoring and phenotyping; limitations include resolution constraints and occlusion, with future directions toward multispectral data and cross-crop generalization.
Abstract
Parameter-Efficient Fine-Tuning (PEFT) of foundation models for agricultural computer vision tasks remains challenging due to limited training data and complex field conditions. This study introduces a Dynamic Similarity-based Graph Adaptation (DSGA) module to adapt the Segment Anything Model (SAM) under extreme data constraints for precise foreground and instance segmentation of small dense objects in complex agricultural environments. Through dynamic similarity graph construction with a learnable polynomial decay-initialized weight ranking mechanism and adaptive local feature aggregation, DSGA establishes robust spatial and dynamic similarity representation with only 4.00M trainable parameters, which is 4.26% of the original SAM. Integrating this graph-based feature adaptation with Low-Rank Adaptation (LoRA) creates a complementary optimization framework that effectively captures both local and global dependencies in image embeddings while preserving model stability and parameter efficiency. Experimental results on a challenging chickpea pod dataset demonstrated that DSGA with LoRA achieved superior performance across multiple metrics evaluated under 2, 4, 8 and 10 shots, with progressive performance gains as shot count increased. Quantitative metrics showed a 17.31% improvement in Structure-measure and a 62.36% gain in adaptive F-measure compared to the baseline SAM fine-tuning. Comprehensive ablation studies and visualization analyses through Grad-CAM and t-SNE validated the framework's effectiveness in feature discrimination. The proposed adaptation demonstrated practical utility for automated agricultural monitoring applications, achieving accurate pod-counting with an adjusted R-squared of 0.8987 for images with 10 to 120 pods under challenging field conditions.
