Lightweight vision transformer and ResNet-9 models for real-time plant disease detection and pest classification with SHAP explainability
Published in Bioengineering & Biotechnology, Earth & Environment, and Sustainability
Context
The rapid advancement of digital technologies alongside increased global commitment to sustainability has intensified the need for efficient crop management solutions. Across agricultural systems, delayed detection and misclassification of plant diseases remain major contributors to reduced yields, threatening food security and undermining progress toward Sustainable Development Goals such as Zero Hunger, No Poverty, Good Health and Well-being, Climate Action, and Life on Land. Plant pests, diseases, and excessive chemical use further exacerbate these challenges. Early, automated visual detection offers a pathway to environmentally responsible and economically viable agricultural practices.
Objective
This study aims to develop and evaluate an automated, real-time plant disease classification framework using Vision Transformers (ViT) and hybrid ViT–CNN architectures, with the goal of supporting farmers and agronomists in early decision-making and sustainable crop protection.
Methods
The research employs deep learning techniques, including ViT and a combined ViT–CNN model built on ResNet-9, trained and evaluated using four publicly available datasets: the Turkey Plant Pests and Diseases (TPPD) dataset (15 classes), the Namibia Maize Image Dataset (3 classes), the Banana Image Dataset (3 classes), and the Tanzania Maize Dataset (3 classes). SHapley Additive exPlanations (SHAP) were applied to generate saliency maps for interpretability. Comparative analyses assessed performance, accuracy, and classification speed across attention-based and hybrid architectures.
Results and conclusions
The proposed model achieved strong performance with 97.4% accuracy, 96.4% precision, 97.09% recall, a 95.7% F1-score, and high agreement measured by Cohen’s Kappa, outperforming existing benchmark models. SHAP visualizations highlighted that the model leverages high-activation areas, edge features, color patterns, texture, shape, and contextual cues in its predictions. While attention-based models improved accuracy, they also caused reduced classification speed. However, integrating attention blocks with CNN layers effectively compensated for this slowdown, achieving both high accuracy and efficient inference. To evaluate the interpretability and deployment feasibility of the proposed model, we considered several parameters are considered to have an integration of faithfulness, localization quality, sparsity, latency, and energy of the models, including pointing accuracy, localization IoU, Centroid Localization Error (pixels), Attribution Sparsity (%), Insertion AUC, Deletion AUC, Time per Explanation (ms/image), Energy Consumption (J/image), Memory Footprint (MB).
Significance
This research provides a transparent and high-performing deep learning solution for plant disease classification, promoting sustainable agricultural development. By reducing reliance on excessive pesticide and herbicide use, enhancing early diagnosis, and improving decision-making, the model supports environmentally responsible farming practices and contributes to global efforts toward food security and ecological resilience. The significance of this evaluation is that it comprehensively assesses the model’s interpretability and real-world deployability by measuring explanation reliability (faithfulness), spatial precision (localization quality), efficiency (latency and memory), and sustainability (energy consumption), ensuring the model is not only accurate but also transparent, efficient, and practical for deployment. Details at: https://link.springer.com/article/10.1186/s12870-026-08667-8
Follow the Topic
What are SDG Topics?
An introduction to Sustainable Development Goals (SDGs) Topics and their role in highlighting sustainable development research.
Continue reading announcement
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in