This research combines environmental monitoring with data-driven analytics to provide a comprehensive understanding of biomass-related air pollution in rural settings.
Using the GRIMM D-11 Aerosol Spectrometer, we monitored PM₁, PM2.5, PM₄, and PM₁₀ concentrations over a full annual cycle during key cooking periods in rural households. The findings revealed severe seasonal and diurnal variability, with winter evenings and mornings consistently showing the highest PM2.5 concentrations—often exceeding WHO 2021 air quality guidelines by several orders of magnitude.
Our study identified several critical observations:
- Winter stagnation and low wind speeds significantly intensified pollutant accumulation.
• Fine and ultrafine particles dominated pollution episodes, highlighting serious health concerns for indoor exposure.
• Mahendragarh consistently exhibited higher PM concentrations, while Jhunjhunu showed stronger signatures of localized combustion and dust resuspension.
• Wind direction and seasonal meteorology strongly influenced pollutant dispersion and transport pathways.
To move beyond conventional statistical analysis, we implemented machine learning frameworks including Random Forest, XGBoost, clustering, anomaly detection, and SHAP interpretability analysis. Among the tested models, Random Forest demonstrated the strongest predictive capability (R² up to 0.87), while classification models achieved approximately 98% accuracy in identifying pollution severity categories.
Importantly, SHAP analysis revealed that lagged PM2.5 concentrations, humidity, wind speed, and temperature-related variables were among the strongest predictors of pollution episodes. Unsupervised clustering further identified distinct pollution regimes associated with combustion intensity and meteorological stagnation.
One of the most significant findings was the consistently high PM₁/PM2.5 ratio (>0.7), emphasizing the dominant role of ultrafine particles in rural household pollution. These particles are especially concerning due to their ability to penetrate deeply into the respiratory system and contribute to long-term health risks.
Our work highlights the urgent need for:
• Clean cooking interventions and improved ventilation strategies
• Winter-focused public health awareness campaigns
• Real-time predictive air quality systems for rural communities
• Regulatory attention toward PM₁ and PM₄ alongside PM2.5 and PM₁₀
By integrating spatiotemporal analysis with interpretable machine learning, this study demonstrates how data-driven tools can support rural air quality management and health-focused environmental policy in biomass-dependent regions.
We hope this work contributes toward advancing sustainable rural energy transitions and protecting vulnerable communities from the hidden burden of household air pollution.
#AirPollution #PM25 #MachineLearning #BiomassBurning #RuralIndia #EnvironmentalHealth #AirQuality #Sustainability #PublicHealth #DataScience