Pradeep Kumar, Era Upadhyay*, Anoop Yadav (2026). Spatiotemporal assessment and machine learning-based prediction of PM2.5 Emissions from biomass combustion in Rural India

Traditional biomass cooking in rural India continues to generate dangerous PM2.5 pollution, posing major health risks. Our study combines year-long monitoring and machine learning to analyze how cooking practices and meteorological factors shape rural air quality in Rajasthan and Haryana.
Pradeep Kumar, Era Upadhyay*, Anoop Yadav (2026). Spatiotemporal assessment and machine learning-based prediction of PM2.5 Emissions from biomass combustion in Rural India
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Explore the Research

Springer International Publishing
Springer International Publishing Springer International Publishing

Spatiotemporal assessment and machine learning-based prediction of PM2.5 Emissions from biomass combustion in Rural India - Bulletin of Atmospheric Science and Technology

Background: Biomass combustion remains a dominant cooking practice in rural India, contributing significantly to indoor and ambient air pollution. Exposure to particulate matter (PM), particularly PM2.5, poses severe health risks, especially in poorly ventilated households. This study aims to analyze PM emission patterns from traditional Chulha use and assess the influence of meteorological factors using machine learning techniques. Methods: Particulate matter concentrations (PM₁, PM2.5, PM₄, PM₁₀) were monitored over a one-year period in two rural sites- Jhunjhunu (JJN) and Mahendragarh (Mgarh). Data were collected across three daily cooking intervals and combined with meteorological variables (temperature, humidity, wind speed, rainfall). Regression models (Linear, Random Forest, XGBoost), classification algorithms, and unsupervised learning (K-Means, Isolation Forest) were applied to predict, classify, and analyze pollution patterns. Results: PM levels peaked during winter and evening cooking hours, often exceeding WHO air quality standards 2021. Meteorological variables, particularly temperature and wind speed, showed strong seasonal influence on pollutant dispersion. Random Forest Regression achieved the best predictive performance (R² = 0.87, RMSE = 18.3 µg/m³ in Jhunjhunu), while classification accuracy reached 98%. SHAP analysis identified PM2.5 lag, humidity, and wind chill as key predictors. Clustering revealed distinct pollution regimes, and anomaly detection successfully flagged episodic high-pollution events. Conclusion: The integration of temporal, meteorological, and machine learning analysis offers a robust framework for understanding rural air pollution. The findings underscore the need for clean cooking interventions, targeted health risk communication, and the application of predictive tools in rural air quality management.

This research combines environmental monitoring with data-driven analytics to provide a comprehensive understanding of biomass-related air pollution in rural settings.

Using the GRIMM D-11 Aerosol Spectrometer, we monitored PM₁, PM2.5, PM₄, and PM₁₀ concentrations over a full annual cycle during key cooking periods in rural households. The findings revealed severe seasonal and diurnal variability, with winter evenings and mornings consistently showing the highest PM2.5 concentrations—often exceeding WHO 2021 air quality guidelines by several orders of magnitude.

Our study identified several critical observations:

  • Winter stagnation and low wind speeds significantly intensified pollutant accumulation.
    • Fine and ultrafine particles dominated pollution episodes, highlighting serious health concerns for indoor exposure.
    • Mahendragarh consistently exhibited higher PM concentrations, while Jhunjhunu showed stronger signatures of localized combustion and dust resuspension.
    • Wind direction and seasonal meteorology strongly influenced pollutant dispersion and transport pathways.

To move beyond conventional statistical analysis, we implemented machine learning frameworks including Random Forest, XGBoost, clustering, anomaly detection, and SHAP interpretability analysis. Among the tested models, Random Forest demonstrated the strongest predictive capability (R² up to 0.87), while classification models achieved approximately 98% accuracy in identifying pollution severity categories.

Importantly, SHAP analysis revealed that lagged PM2.5 concentrations, humidity, wind speed, and temperature-related variables were among the strongest predictors of pollution episodes. Unsupervised clustering further identified distinct pollution regimes associated with combustion intensity and meteorological stagnation.

One of the most significant findings was the consistently high PM₁/PM2.5 ratio (>0.7), emphasizing the dominant role of ultrafine particles in rural household pollution. These particles are especially concerning due to their ability to penetrate deeply into the respiratory system and contribute to long-term health risks.

Our work highlights the urgent need for:
• Clean cooking interventions and improved ventilation strategies
• Winter-focused public health awareness campaigns
• Real-time predictive air quality systems for rural communities
• Regulatory attention toward PM₁ and PM₄ alongside PM2.5 and PM₁₀

By integrating spatiotemporal analysis with interpretable machine learning, this study demonstrates how data-driven tools can support rural air quality management and health-focused environmental policy in biomass-dependent regions.

We hope this work contributes toward advancing sustainable rural energy transitions and protecting vulnerable communities from the hidden burden of household air pollution.

#AirPollution #PM25 #MachineLearning #BiomassBurning #RuralIndia #EnvironmentalHealth #AirQuality #Sustainability #PublicHealth #DataScience

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in