Pradeep Kumar, Era Upadhyay*, Anoop Yadav (2026). Spatiotemporal assessment and machine learning-based prediction of PM2.5 Emissions from biomass combustion in Rural India

Traditional biomass cooking in rural India continues to generate dangerous PM2.5 pollution, posing major health risks. Our study combines year-long monitoring and machine learning to analyze how cooking practices and meteorological factors shape rural air quality in Rajasthan and Haryana.
Pradeep Kumar, Era Upadhyay*, Anoop Yadav (2026). Spatiotemporal assessment and machine learning-based prediction of PM2.5 Emissions from biomass combustion in Rural India
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Explore the Research

Springer International Publishing
Springer International Publishing Springer International Publishing

Spatiotemporal assessment and machine learning-based prediction of PM2.5 Emissions from biomass combustion in Rural India - Bulletin of Atmospheric Science and Technology

Background: Biomass combustion remains a dominant cooking practice in rural India, contributing significantly to indoor and ambient air pollution. Exposure to particulate matter (PM), particularly PM2.5, poses severe health risks, especially in poorly ventilated households. This study aims to analyze PM emission patterns from traditional Chulha use and assess the influence of meteorological factors using machine learning techniques. Methods: Particulate matter concentrations (PM₁, PM2.5, PM₄, PM₁₀) were monitored over a one-year period in two rural sites- Jhunjhunu (JJN) and Mahendragarh (Mgarh). Data were collected across three daily cooking intervals and combined with meteorological variables (temperature, humidity, wind speed, rainfall). Regression models (Linear, Random Forest, XGBoost), classification algorithms, and unsupervised learning (K-Means, Isolation Forest) were applied to predict, classify, and analyze pollution patterns. Results: PM levels peaked during winter and evening cooking hours, often exceeding WHO air quality standards 2021. Meteorological variables, particularly temperature and wind speed, showed strong seasonal influence on pollutant dispersion. Random Forest Regression achieved the best predictive performance (R² = 0.87, RMSE = 18.3 µg/m³ in Jhunjhunu), while classification accuracy reached 98%. SHAP analysis identified PM2.5 lag, humidity, and wind chill as key predictors. Clustering revealed distinct pollution regimes, and anomaly detection successfully flagged episodic high-pollution events. Conclusion: The integration of temporal, meteorological, and machine learning analysis offers a robust framework for understanding rural air pollution. The findings underscore the need for clean cooking interventions, targeted health risk communication, and the application of predictive tools in rural air quality management.

This research combines environmental monitoring with data-driven analytics to provide a comprehensive understanding of biomass-related air pollution in rural settings.

Using the GRIMM D-11 Aerosol Spectrometer, we monitored PM₁, PM2.5, PM₄, and PM₁₀ concentrations over a full annual cycle during key cooking periods in rural households. The findings revealed severe seasonal and diurnal variability, with winter evenings and mornings consistently showing the highest PM2.5 concentrations—often exceeding WHO 2021 air quality guidelines by several orders of magnitude.

Our study identified several critical observations:

  • Winter stagnation and low wind speeds significantly intensified pollutant accumulation.
    • Fine and ultrafine particles dominated pollution episodes, highlighting serious health concerns for indoor exposure.
    • Mahendragarh consistently exhibited higher PM concentrations, while Jhunjhunu showed stronger signatures of localized combustion and dust resuspension.
    • Wind direction and seasonal meteorology strongly influenced pollutant dispersion and transport pathways.

To move beyond conventional statistical analysis, we implemented machine learning frameworks including Random Forest, XGBoost, clustering, anomaly detection, and SHAP interpretability analysis. Among the tested models, Random Forest demonstrated the strongest predictive capability (R² up to 0.87), while classification models achieved approximately 98% accuracy in identifying pollution severity categories.

Importantly, SHAP analysis revealed that lagged PM2.5 concentrations, humidity, wind speed, and temperature-related variables were among the strongest predictors of pollution episodes. Unsupervised clustering further identified distinct pollution regimes associated with combustion intensity and meteorological stagnation.

One of the most significant findings was the consistently high PM₁/PM2.5 ratio (>0.7), emphasizing the dominant role of ultrafine particles in rural household pollution. These particles are especially concerning due to their ability to penetrate deeply into the respiratory system and contribute to long-term health risks.

Our work highlights the urgent need for:
• Clean cooking interventions and improved ventilation strategies
• Winter-focused public health awareness campaigns
• Real-time predictive air quality systems for rural communities
• Regulatory attention toward PM₁ and PM₄ alongside PM2.5 and PM₁₀

By integrating spatiotemporal analysis with interpretable machine learning, this study demonstrates how data-driven tools can support rural air quality management and health-focused environmental policy in biomass-dependent regions.

We hope this work contributes toward advancing sustainable rural energy transitions and protecting vulnerable communities from the hidden burden of household air pollution.

#AirPollution #PM25 #MachineLearning #BiomassBurning #RuralIndia #EnvironmentalHealth #AirQuality #Sustainability #PublicHealth #DataScience

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Biomedical Research
Life Sciences > Health Sciences > Biomedical Research
Sustainability
Research Communities > Community > Sustainability
Earth Sciences
Physical Sciences > Earth and Environmental Sciences > Earth Sciences
Analysis
Mathematics and Computing > Mathematics > Analysis
Environmental Sciences
Physical Sciences > Earth and Environmental Sciences > Environmental Sciences
Spotlight on Research from India
Research Publishing > Spotlight on Research from India

What are SDG Topics?

An introduction to Sustainable Development Goals (SDGs) Topics and their role in highlighting sustainable development research.

Continue reading announcement

Related Collections

With Collections, you can get published faster and increase your visibility.

AISAM Conferences and Workshops

In this collection, BAST regularly and continuously publishes content that has been presented at one of the official conferences or workshops held by the Italian Association of Atmospheric Sciences and Meteorology (AISAM). The society herewith aims at providing a broad platform of excellent science originating from or presented to the AISAM community.

Publishing Model: Hybrid

Deadline: Ongoing

Artificial Intelligence and Meteorology

Progress in Numerical Weather Prediction, and more generally in meteorology, has traditional stemmed from increased availability of Earth observations, improved knowledge of the bio-geo-physical processes represented in numerical models, and an ever growing computational capacity to realistically simulate weather and environmental phenomena.

As typical of scientific and technological developments, periods of continuous and gradual developments alternate pivotal moments in which cognitive and technological advancements permit more rapid and disruptive innovations. These phases call for different approaches and methodologies to take advantage of the new opportunities and aim at real breakthroughs.

This is the case for Meteorology and Climate sectors, thanks to a generational leap in the HPC infrastructure, combined with unprecedented data availability for Earth Observations, which truly belong to Big Data. This confluence of data and computational resources has been calling for new approaches to optimally extract the potentially available information.

Artificial Intelligence, and Machine Learning methods in particular, have been identified as a key innovative methodology to leverage these opportunities and it is now necessary to proceed with development plans that can progressive integrate traditional model development, based on physical parameterisation, with AI-based approaches, that are extremely powerful and can be complementary. This is well explained in the comprehensive Technical Memo “Machine learning at ECMWF: A roadmap for the next 10 years”, by Peter Dueben et al. published in January 2021 (n. 878) as complement to the new ECMWF 10-year strategy.

The attention on Big Data, AI and Machine Learning methodologies is currently in a growing phase, as demonstrated by the numbers of scientific publications and applications. This special topic issue of BAST is therefore dedicated to "Artificial Intelligence and Meteorology" to foster an exchange of the ongoing scientific efforts and experiences.

Publishing Model: Hybrid

Deadline: Ongoing