A deep dive into benchmarking lightweight YOLO architectures for okra maturity detection
Introduction
Agricultural automation faces a critical challenge: how can we develop inspection systems that are both highly accurate and fast enough for real-time processing? In our recent study published in The Journal of Supercomputing, my colleagues and I addressed this question by systematically benchmarking four state-of-the-art YOLO nano-variant object detection models for thermal imaging applications in okra maturity grading.
Why Thermal Imaging Matters
Traditional RGB-based inspection systems struggle under variable lighting conditions—a common reality in agricultural settings ranging from night harvesting to indoor packing facilities. Thermal imaging offers a compelling alternative by capturing temperature-dependent information that remains consistent regardless of ambient lighting. For okra specifically, thermal signatures reveal physiological differences in moisture content and thermal mass between maturity stages, providing a robust basis for automated quality assessment.
However, thermal imagery presents unique computational challenges. Single-channel intensity data, reduced texture information, and temperature-dependent contrast require specialized architectural considerations that haven't been systematically evaluated in prior research.
The Research Gap
While YOLO (You Only Look Once) architectures have been extensively benchmarked on RGB datasets such as COCO and ImageNet, comprehensive evaluation on thermal agricultural imagery remained absent from the literature. This gap is particularly significant because:
- Architectural innovations designed for RGB may not transfer effectively to thermal domains
- Real-time industrial sorting requires sub-50 millisecond inference latency
- Deployment on heterogeneous computing platforms (GPU vs. CPU) demands quantitative performance analysis
Our study addresses these gaps by evaluating YOLOv5n, YOLOv8n, YOLOv11n, and YOLOv12n across multiple performance dimensions.
Methodology Highlights
We developed a dual-source thermal dataset combining passive and active thermal imaging modalities. The active thermal approach—involving controlled preheating to 30°C—proved particularly important. Natural thermal contrast between adequately matured and overripe okra averages only 2.8°C under ambient conditions, which can be insufficient under variable environments. Controlled preheating amplifies this contrast to 4–8°C by exploiting maturity-dependent differences in thermal mass and cooling behavior.
Our experimental design incorporated rigorous statistical validation across five independent training runs, ensuring reproducibility and significance testing of observed performance differences. This methodological rigor is essential for drawing reliable conclusions in machine learning research.
Key Findings
Training Duration Dependency
One of our most significant findings relates to training efficiency. Under the resource-efficient 10-epoch protocol—reflecting rapid development scenarios—YOLOv8n achieved the highest detection accuracy (66.3% mAP@0.5–0.95), while YOLOv5n delivered comparable performance (66.1%) with superior computational efficiency.
However, extended training experiments revealed a critical insight: attention-based architectures (YOLOv11n and YOLOv12n) achieve higher peak accuracy (73.1% and 72.9% respectively) when training budgets permit 45–67 epochs. This performance ranking inversion suggests that architectural complexity translates to gains only with sufficient training iterations—a crucial consideration for practical deployment.
Platform-Specific Performance
Our benchmarking across heterogeneous computing platforms revealed distinct trade-offs:
- GPU deployment (NVIDIA T4 with TensorRT): YOLOv8n achieved 1.6 ms inference latency, supporting throughput exceeding 625 FPS—well above the ≥20 FPS requirement for real-time sorting
- CPU deployment (ONNX Runtime): YOLOv5n exhibited superior performance at 31.1 ms, making it optimal for edge and embedded scenarios
These findings provide practical guidance for selecting architectures based on deployment constraints.
Architectural Insights
Through gradient-weighted class activation mapping (Grad-CAM) analysis, we discovered that decoupled detection heads enable task-specific feature specialization. Classification branches focus selectively on thermal intensity gradients, while localization branches emphasize geometric boundaries—a design principle particularly beneficial for thermal imagery where diagnostic features are spatially distinct.
Ablation studies confirmed that mosaic augmentation improved detection performance by 6.2% without additional latency, while balanced loss weighting outperformed imbalanced configurations in this binary classification task.
Practical Implications
Our research demonstrates that thermal-based YOLO nano-variants are ready for integration into automated sorting pipelines, achieving 75 kg/min throughput with <2% background false positives. The system operates at sustained rates exceeding industrial requirements while maintaining high detection accuracy.
For practitioners, the key takeaway is this: model selection must jointly consider deployment constraints (inference latency, memory footprint) and available training resources (time budget, computational capacity). Simple architectures like YOLOv5n and YOLOv8n excel in rapid-iteration scenarios, while attention-based variants justify their complexity only when extended training is feasible.
Future Directions
Several promising research directions emerge from this work:
- Dataset expansion across cultivars, seasons, and environmental conditions to enhance model robustness
- Multi-class grading spanning the full maturity spectrum
- Multispectral fusion combining thermal and RGB modalities
- Knowledge distillation from transformer-based teachers to lightweight students
- Field deployment trials in commercial facilities for long-term validation
Conclusion
This systematic benchmark establishes quantitative baselines for thermal agricultural imaging and provides practical guidance for architecture selection. The identified strategies—anchor-free detection, decoupled heads, mosaic augmentation, and balanced loss weighting—are transferable to other crops and sensing modalities, offering strong potential for advancing automated agricultural inspection systems.
Article Citation: Ganapathy, M.R., Pugazhendi, P., Periasamy, S., Nagarajan, B. (2026). Benchmarking YOLO nano-architectures for real-time thermal imaging: application to okra maturity grading on heterogeneous computing platforms. The Journal of Supercomputing, 82:97. https://doi.org/10.1007/s11227-026-08226-w
Data Availability: The public thermal dataset is available at Mendeley Data. Code is available at the project GitHub repository.