Substantial gaps remain in developing reliable high-resolution air quality models, owing to sparse data, complex atmospheric physics and chemistry, and limitations of current approaches. While machine learning demonstrates predictive skill, solely data-driven methods often yield biased, physically inconsistent predictions when applied to unobserved areas, as they do not inherently preserve the dynamics of air pollutant transport. In addition, common practices of validating models by random sampling across space and time may overestimate generalizability. Advancing hybrid techniques that integrate mechanistic atmospheric processes, and employ rigorous spatial-temporal validation, is key to producing consistent, robust forecasts. By encoding fluid dynamics within an innovative deep graph learning framework, our work significantly reduces biases, representing marked progress toward accurate, reliable and interpretable air quality assessment.
In this work, we developed and validated a hybrid physics-inspired deep graph learning approach to substantially improve the physical realism and reduce estimation biases of air pollutant concentrations. Our method is motivated by the ability of graph neural networks to represent fine-scale neighborhood dependencies embodying the intricate spatiotemporal dynamics of air pollutants. We further guide the model optimization process using partial differential equations to embed domain knowledge of atmospheric transport physics and chemistry. This integration of scientific principles with flexible deep learning provides a pathway to enhancing model performance beyond pure data-driven approaches. Our results demonstrate the efficacy of this strategy for producing physically consistent and reliable air quality assessments.
Architecture of deep graph hybrid network
As a large country with heterogeneous topography, diverse weather, and multiple emissions, China was selected as the test bed for our approach. In site-based tests, our approach consistently improved R2 by an average of 11-22% compared with representative machine learning methods. Compared with state-of-the-art air quality predictions in China, our method consistently and significantly improved air quality assessments in terms of spatial scale and/or prediction accuracy; the interpretation showed that the prediction better reflected the spatiotemporal distribution of fine-scale air pollutants in China.
Hybrid graph deep learning improving air quality assessment
This method effectively captured fine-scale pollutant transport and significantly reduced estimation bias compared to existing studies. For PM10, we achieved nationwide 1 km resolution grids with higher generalization accuracy (site R2: 0.85) than comparable models at 1-10 km resolution (CV R2: 0.82-0.83). Similarly for PM2.5 in China, we attained finer 1 km resolution with higher generalization (site R2: 0.87) than recent methods at 1-50 km scales (CV R2: 0.64-0.85). Our O3 and NO2 models also matched or exceeded state-of-the-art generalization metrics at 1km resolution. By incorporating fluid dynamics and mass conservation constraints, our physics-inspired approach yields improved temporal evolution and continuity of fine-scale concentration fields compared to pure data-driven techniques. Our model produces smoothly-varying spatial surfaces without abrupt changes, reflecting the transport mechanisms governing these pollutants. Based on the 1x1 km2 grid estimates, we analyzed the spatial variability over time to gain insights into regional/global transport patterns and fine-scale interactions between meteorology and air quality.
Four representative events of air pollution presented by the predicted surface grids
The 1x1 km2 gridded estimates provided an accurate spatiotemporal representation of all air pollutants, enabling more reliable fine-scale analysis of national trends compared to prior studies. The review showed few existing works have reported high-resolution predicted surfaces for all criteria pollutants and aggregated air quality index (AQI). This work fills that gap by providing fine-scale AQI grids across China, offering valuable information in areas lacking monitoring. Nationwide, population-weighted means were much higher than raw means for NO2 (by ~9 μg/m3) and PM2.5 (by ~13.5 μg/m3), indicating concentration in populated regions. Yearly trends showed national declines from 2015-2018 for population-weighted PM2.5, PM10, SO2 and CO, but increases in O3. While aligned with ground measurements, those mostly overestimated national averages by 6-52% due to urban focus. Reanalysis products also substantially underestimated averages due to coarse resolution. Our daily national AQI maps provided granular spatiotemporal details, with aggregated trends consistent with measurement-based AQI (correlation 0.76-0.92). Seasonally, PM10 dominated summer AQI in deserts while O3 dominated elsewhere; in winter, PM2.5 dominated eastern regions while PM10 dominated elsewhere. In summary, our fine-scale gridded estimates enabled much more reliable and finer statistics of national air pollutant trends and air quality index than previously.
National population-weighted AQI changes
This study introduces two pivotal innovations: Firstly, we demonstrate that graph neural networks excel in representing irregular monitoring data and encoding the intricate physics of pollutant transport, effectively overcoming the limitations of conventional CNNs. Secondly, the integration of atmospheric knowledge constraints into a semi-supervised deep graph learning framework yields substantial enhancements in model accuracy and generalizability when compared to pure data-driven methods. Our hybrid approach, which embeds spatial-temporal dynamics and chemical processes, consistently outperforms purely data-driven techniques. While further evaluation is warranted, these initial findings strongly suggest that the fusion of scientific principles with machine learning has the potential to propel air quality modeling forward.
This research marks a significant step toward the harmonious integration of foundational atmospheric knowledge with data-driven techniques. The path forward holds immense promise for interdisciplinary collaboration and innovation. We aspire for this work to inspire further breakthroughs at the intersection of environmental science and machine learning, ultimately delivering robust and interpretable models that benefit not only air quality assessment but also other related fields within the atmospheric sciences.