Enhancing PM2.5 Air Pollution Forecasting with Novel Random Imputation Based on Hybrid RNN-Bidirectional GRU (nRI RNN-BiGRU) Model

The issue of air pollution is critical for both environment and global public health. It is crucial to develop accurate forecasting method to substantially mitigate the adverse health effects of air pollution. Missing data is common in dataset where specific observations or values are not recorded.
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Explore the Research

SpringerLink
SpringerLink SpringerLink

Enhancing PM $$_{2.5}$$ 2.5 Air Pollution Forecasting with Novel Random Imputation Based on Hybrid RNN-Bidirectional GRU (nRI RNN-BiGRU) Model - SN Computer Science

The issue of air pollution is critical for both the environment and global public health. It is crucial to develop accurate forecasting methods to substantially mitigate the adverse health effects of air pollution. Missing data is common in datasets where specific observations or values are not recorded. To address the problem of missing data in air quality datasets, we used a novel random imputation (nRI) method. This method accurately captures temporal dependencies of air pollution and focuses on continuously missing completely at random (MCAR) and forecasting PM $$_{2.5}$$ 2.5 concentrations. This method accurately captures temporal dependencies of air pollution to focus on continuously MCAR and forecasting PM $$_{2.5}$$ 2.5 concentrations. The Central Pollution Control Board provided the data in this study. Two-step methods for managing missing data follow a specific approach. In the first step, outliers are tackled by replacing them with statistically valid minimum and maximum values determined by the interquartile range (IQR). In the second step, cells that contain NaN (Not a Number) values are filled using random samples drawn from the distribution of the corresponding feature. The proposed (nRI RNN-BiGRU) model outperforms traditional deep learning models in PM $$_{2.5}$$ 2.5 forecasting. It achieves a 27.8792 unit lower RMSE than conventional models and improves the R² score by 0.506. The model also demonstrates significant error reductions across key performance metrics, with a 16.75% decrease in MAE, a 20.07% reduction in MSE, and a 10.03% improvement in MAPE compared to CNN, among others. The experimental results confirm that according to the Friedman ranking, the nRI RNN-BiGRU model consistently ranks as the most optimal model. These findings underscore its effectiveness in air pollution forecasting, supporting proactive environmental protection and public health strategies. Our findings underscore the urgency of the air pollution issue, indicating a likely increase in PM $$_{2.5}$$ 2.5 concentration levels. The potential health risks associated with fine particulates PM $$_{2.5}$$ 2.5 , such as respiratory infections, asthma, and heart disease, further highlight the need for effective strategies for environmental protection and public health. It is, therefore, imperative to take timely, effective measures to address this issue and safeguard public health and well-being.

To address the problem of missing data in air quality datasets, we used a novel random imputation (nRI) method. This method accurately captures temporal dependencies of air pollution and focuses on continuously missing completely at random (MCAR) and forecasting PM2.5 concentrations. This method accurately captures temporal dependencies of air pollution to focus on continuously MCAR and forecasting PM2.5 concentrations. The Central Pollution Control Board provided the data in this study. Two-step methods for managing missing data follow a specific approach. In the first step, outliers are tackled by replacing them with statistically valid minimum and maximum values determined by the interquartile range (IQR). In the second step, cells that contain NaN (Not a Number) values are filled using random samples drawn from the distribution of the corresponding feature. The proposed (nRI RNN-BiGRU) model outperforms traditional deep learning models in PM2.5 forecasting. It achieves a 27.8792 unit lower RMSE than conventional models and improves the R2 score by 0.506. The model also demonstrates significant error reductions across key performance metrics, with a 16.75% decrease in MAE, a 20.07% reduction in MSE, and a 10.03% improvement in MAPE compared to CNN, among others. The experimental results confirm that according to the Friedman ranking, the nRI RNN-BiGRU model consistently ranks as the most optimal model. These findings underscore its effectiveness in air pollution forecasting, supporting proactive environmental protection and public health strategies. Our findings underscore the urgency of the air pollution issue, indicating a likely increase in PM2.5 concentration levels. The potential health risks associated with fine particulates PM2.5, such as respiratory infections, asthma, and heart disease, further highlight the need for effective strategies for environmental protection and public health. It is, therefore, imperative to take timely, effective measures to address this issue and safeguard public health and well-being.

Air pollution stands as a pervasive challenge in today’s world, casting a shadow over the quality of our air and the health of our planet. Additionally, biodiversity, ecosystems, and ecosystem services like nitrogen cycling are negatively impacted by air pollution. Today, air pollution affects urban and rural areas both. The methodological framework is employed to evaluate
and compare the performance of the proposed nRI RNNBiGRU model against conventional deep learning models and traditional imputation techniques for forecasting PM2.5 concentrations. The methodology encompasses data preprocessing, missing data imputation strategies, model architecture design, training procedures, and performance evaluation metrics. By systematically implementing and assessing these components, the study aims to provide a robust comparative analysis of the predictive accuracy and reliability of the proposed hybrid model in the context of air pollution time-series forecasting. Handling single or consecutive missing values using a range of columns and random replacement involves filling the missing
values in a dataset when there are successive occurrences of missing data points along a row. This method aims to maintain the overall distribution of the data while imputing missing values with feasible estimates from the existing data.

In conclusion, the nRI RNN-BiGRU method can effectively predict and forecast PM2.5 concentration, and including PM10 and NO2 data in model training can improve prediction accuracy. Non-parametric statistical tests, including Friedman ranking and Holm’s post hoc procedure, were applied to rigorously assess and rank the performance of all deep learning models. The proposed nRI RNNBiGRU ranked first with statistical significance p-value < 0.05, validating its general superiority across multiple pollutants and metrics. The one-year forecast reveals that the median PM2.5 levels are expected to remain between 165 and 185 µg/m³, categorizing air quality as “Very Poor”. This suggests a serious risk to public health, with adverse effects expected for the entire population, not just sensitive groups. Fine particulates PM2.5 may enhance the likelihood of health issues, such as respiratory infections, asthma, and heart disease, which can help devise effective strategies for environmental protection and public health.

https://doi.org/10.1007/s42979-025-04167-y 

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Air Pollution and Air Quality
Physical Sciences > Earth and Environmental Sciences > Environmental Sciences > Pollution > Air Pollution and Air Quality
Research Data
Research Communities > Community > Research Data
Machine Learning
Mathematics and Computing > Statistics > Statistics and Computing > Machine Learning