Behind the Paper

Behind the Research: Optimizing Real-Time Thermal Imaging for Agricultural Automation

I am happy to share our recent research published in The Journal of Supercomputing, which systematically evaluates lightweight YOLO nano-architectures for real-time thermal imaging applications in agriculture.

A deep dive into benchmarking lightweight YOLO architectures for okra maturity detection

Introduction

Agricultural automation faces a critical challenge: how can we develop inspection systems that are both highly accurate and fast enough for real-time processing? In our recent study published in The Journal of Supercomputing, my colleagues and I addressed this question by systematically benchmarking four state-of-the-art YOLO nano-variant object detection models for thermal imaging applications in okra maturity grading.

Why Thermal Imaging Matters

Traditional RGB-based inspection systems struggle under variable lighting conditions—a common reality in agricultural settings ranging from night harvesting to indoor packing facilities. Thermal imaging offers a compelling alternative by capturing temperature-dependent information that remains consistent regardless of ambient lighting. For okra specifically, thermal signatures reveal physiological differences in moisture content and thermal mass between maturity stages, providing a robust basis for automated quality assessment.

However, thermal imagery presents unique computational challenges. Single-channel intensity data, reduced texture information, and temperature-dependent contrast require specialized architectural considerations that haven't been systematically evaluated in prior research.

The Research Gap

While YOLO (You Only Look Once) architectures have been extensively benchmarked on RGB datasets such as COCO and ImageNet, comprehensive evaluation on thermal agricultural imagery remained absent from the literature. This gap is particularly significant because:

  1. Architectural innovations designed for RGB may not transfer effectively to thermal domains
  2. Real-time industrial sorting requires sub-50 millisecond inference latency
  3. Deployment on heterogeneous computing platforms (GPU vs. CPU) demands quantitative performance analysis

Our study addresses these gaps by evaluating YOLOv5n, YOLOv8n, YOLOv11n, and YOLOv12n across multiple performance dimensions.

Methodology Highlights

We developed a dual-source thermal dataset combining passive and active thermal imaging modalities. The active thermal approach—involving controlled preheating to 30°C—proved particularly important. Natural thermal contrast between adequately matured and overripe okra averages only 2.8°C under ambient conditions, which can be insufficient under variable environments. Controlled preheating amplifies this contrast to 4–8°C by exploiting maturity-dependent differences in thermal mass and cooling behavior.

Our experimental design incorporated rigorous statistical validation across five independent training runs, ensuring reproducibility and significance testing of observed performance differences. This methodological rigor is essential for drawing reliable conclusions in machine learning research.

Key Findings

Training Duration Dependency

One of our most significant findings relates to training efficiency. Under the resource-efficient 10-epoch protocol—reflecting rapid development scenarios—YOLOv8n achieved the highest detection accuracy (66.3% mAP@0.5–0.95), while YOLOv5n delivered comparable performance (66.1%) with superior computational efficiency.

However, extended training experiments revealed a critical insight: attention-based architectures (YOLOv11n and YOLOv12n) achieve higher peak accuracy (73.1% and 72.9% respectively) when training budgets permit 45–67 epochs. This performance ranking inversion suggests that architectural complexity translates to gains only with sufficient training iterations—a crucial consideration for practical deployment.

Platform-Specific Performance

Our benchmarking across heterogeneous computing platforms revealed distinct trade-offs:

  • GPU deployment (NVIDIA T4 with TensorRT): YOLOv8n achieved 1.6 ms inference latency, supporting throughput exceeding 625 FPS—well above the ≥20 FPS requirement for real-time sorting
  • CPU deployment (ONNX Runtime): YOLOv5n exhibited superior performance at 31.1 ms, making it optimal for edge and embedded scenarios

These findings provide practical guidance for selecting architectures based on deployment constraints.

Architectural Insights

Through gradient-weighted class activation mapping (Grad-CAM) analysis, we discovered that decoupled detection heads enable task-specific feature specialization. Classification branches focus selectively on thermal intensity gradients, while localization branches emphasize geometric boundaries—a design principle particularly beneficial for thermal imagery where diagnostic features are spatially distinct.

Ablation studies confirmed that mosaic augmentation improved detection performance by 6.2% without additional latency, while balanced loss weighting outperformed imbalanced configurations in this binary classification task.

Practical Implications

Our research demonstrates that thermal-based YOLO nano-variants are ready for integration into automated sorting pipelines, achieving 75 kg/min throughput with <2% background false positives. The system operates at sustained rates exceeding industrial requirements while maintaining high detection accuracy.

For practitioners, the key takeaway is this: model selection must jointly consider deployment constraints (inference latency, memory footprint) and available training resources (time budget, computational capacity). Simple architectures like YOLOv5n and YOLOv8n excel in rapid-iteration scenarios, while attention-based variants justify their complexity only when extended training is feasible.

Future Directions

Several promising research directions emerge from this work:

  1. Dataset expansion across cultivars, seasons, and environmental conditions to enhance model robustness
  2. Multi-class grading spanning the full maturity spectrum
  3. Multispectral fusion combining thermal and RGB modalities
  4. Knowledge distillation from transformer-based teachers to lightweight students
  5. Field deployment trials in commercial facilities for long-term validation

Conclusion

This systematic benchmark establishes quantitative baselines for thermal agricultural imaging and provides practical guidance for architecture selection. The identified strategies—anchor-free detection, decoupled heads, mosaic augmentation, and balanced loss weighting—are transferable to other crops and sensing modalities, offering strong potential for advancing automated agricultural inspection systems.


Article Citation: Ganapathy, M.R., Pugazhendi, P., Periasamy, S., Nagarajan, B. (2026). Benchmarking YOLO nano-architectures for real-time thermal imaging: application to okra maturity grading on heterogeneous computing platforms. The Journal of Supercomputing, 82:97. https://doi.org/10.1007/s11227-026-08226-w

Data Availability: The public thermal dataset is available at Mendeley Data. Code is available at the project GitHub repository.