Behind the Paper

Good Forecast, Missing Physics: Looking Inside of AI Weather Ensembles

"AI tools can make mistakes. Double-check important info." We all know that AI is fallible. But what about AI for science? How much can we trust scientific AI models? This study reveals that today's AI weather forecast models such as GenCast carry a systematic, noise-like bias at the mesoscale.

Published in Earth & Environment and Computational Sciences

May 07, 2026

Hisu Kim, Jin-Ho Yoon & Jihun Ryu

3 contributors

Good Forecast, Missing Physics: Looking Inside of AI Weather Ensembles

Liked by Jeremy Cheuk-Hin Leung and 11 others

Explore the Research

Artificial intelligence has begun performing tasks that once required expert judgment: reading medical scans, drafting legal briefs, and, increasingly, predicting the global weather. Yet, each new application raises a haunting question: How would we know if the model were doing right or not?

In the world of large language models, we have a name for it: hallucination. We have learned that a confident answer from an AI can quickly dissolve into fiction. Now, a new study led by Hisu Kim and Jin-Ho Yoon at the Gwangju Institute of Science and Technology (GIST), published in npj Climate and Atmospheric Science, asks what an AI "hallucination" looks like in a weather forecast, and whether our usual scoring metrics would even detect it.

To find the answer, the research team launched a year’s worth of forecasts throughout 2021. They compared three forecast outputs: (1) IFS-HRES: The ECMWF's high-resolution deterministic model, (2) IFS-ENS: The industry-standard operational ensemble, and (3) GenCast: Google DeepMind’s state-of-the-art AI model that uses a "diffusion process" to generate its ensemble forecasts. The team tracked kinetic energy (KE) at 300 hPa, the altitude of the jet stream, where tiny errors rapidly amplify into major shifts in the weather, a phenomenon known as the butterfly effect.

The Diagnosis: A Tale of Two Scales

Figure 2: Kinetic energy spectra at 300 hPa for three forecast systems: IFS-HRES (top row), IFS-ENS (middle row), and GenCast (bottom row). The left column shows each system's spectrum at every forecast lead time, color-coded from blue (12 hours) to red (10 days); black lines mark the initial condition. The right column shows how each spectrum changes relative to that starting point. In the numerical models as forecast lead time grows, energy drains from the smallest scales, exactly as atmospheric dissipation requires. In GenCast, energy at the smallest scales drifts upward rather than downward, the visual signature of the physics missing from the model. (Figure 2 from the paper) — Figure 1: Kinetic energy spectra at 300 hPa for three forecast systems: IFS-HRES (top row), IFS-ENS (middle row), and GenCast (bottom row). The left column shows each system's spectrum at every forecast lead time, color-coded from blue (12 hours) to red (10 days); black lines mark the initial condition. The right column shows how each spectrum changes relative to that starting point. (Figure 2 from the paper)

By decomposing the wind fields into different spatial scales, the researchers "turned the forecasts inside out." The results revealed a fundamental divergence in how AI "sees" the atmosphere:

Success in the synoptic scales: At large scales, the realm of highs, lows, and typhoons, GenCast is remarkable. Its ensemble spread grows much like the real atmosphere, following the established laws of turbulence.
Break in the smaller scales, i.e., mesoscale: Below the 400-kilometer mark (the mesoscale), the physical realism collapses.

In traditional numerical models, energy drains from the smallest scales as forecast lead time grows—process called atmospheric dissipation. In GenCast, the energy at these scales drifts upward instead. Its energy spectrum stops following physical laws and settles into a featureless plateau: the mathematical signature of white noise rather than weather.

The "Noise" Fingerprint

Figure 2: Rotaional and divergent components of Kinetic Energy (KE) spectra. Solid lines indicate rotational components and dashed lines indicate divergent components of KE for (a) IFS-HRES, (b) the first ensemble member of IFS-ENS, and (c) GenCast. Black lines for each model present decomposed spectra of the initial condition. Gray dashed lines with slopes of -3 and -5/3 are shown as reference turbulence power laws. (Figure 4 from the paper)

A second test confirmed the suspicion. Using Helmholtz decomposition, the team split the wind into its rotating and diverging parts. In the real atmosphere, these remain strictly separated at certain scales. In GenCast, they collapsed into equal magnitude—exactly what happens in pure, random noise.

"The moment we saw those two components converge at the mesoscale, we stopped thinking about turbulence," says Hisu Kim, the study's lead author. "We started wondering whether what we were looking at was the diffusion noise itself (the very mechanism that built the ensemble), leaving its fingerprint in the forecast."

This wasn't just a GenCast quirk. The team found the same "flat mesoscale" in four different versions of GenCast and in AIFS-ENS, the European Centre’s own AI ensemble model. It appears to be a systemic "fingerprint" of current noise-utilizing ensemble forecasting methods.

Physical Consequence: Smearing the Jet Stream

FIG 1: Figure 8 from the paper. The magnitude of the kinetic energy gradient at 300 hPa, for (a) ERA5 reanalysis, (b) IFS-HRES, (c) IFS-ENS, and (d) GenCast forecasts. Red filaments mark the sharp edges of the jet stream. (a-c) shows clear filament structure of the jet stream around the midlatitude, while a noise-like pattern of kinetic energy gradient covered (d) GenCast's result. — Figure 2: The magnitude of the kinetic energy gradient at 300 hPa, for (a) ERA5 reanalysis, (b) IFS-HRES, (c) IFS-ENS, and (d) GenCast forecasts. Red filaments mark the sharp edges of the jet stream. (Figure 8 from the paper)

The practical result of these artifacts is visible in the KE Gradient, essentially a "sharpness filter" for the wind. In the traditional view, the jet stream has crisp boundaries that mark the edge of high-speed winds. However, GenCast produces a "static" texture reminiscent of an old TV screen. It fails to produce the sharp, filamentary structures required to model the jet stream core accurately.

A Physical Conscience for AI

Does this mean AI forecasts are unreliable? Not at all. GenCast’s performance on conventional metrics is a genuine advance and often superior to traditional models. But as corresponding author Jin-Ho Yoon notes, "The diversity we see inside today’s AI ensembles is limited by statistical properties rather than grounded in physical law." The study, titled "A Spectral Test of the Butterfly Effect and Physical Consistency in the Diffusion-Based GenCast’s Ensembles," serves as a diagnosis for the next generation of AI development.

If large language models taught us that fluency is not truth, this work suggests a parallel for weather AI: at the smallest scales, the shape of variance can outrun the physics beneath it. The road toward trustworthy scientific AI runs through tests like this, which ask a model not just whether it predicts the weather, but whether it is "speaking the language of the atmosphere."

Multiple Contributors

Hisu Kim, Jin-Ho Yoon & Jihun Ryu

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Atmospheric Science

Physical Sciences > Earth and Environmental Sciences > Earth Sciences > Atmospheric Science

Meteorology

Physical Sciences > Earth and Environmental Sciences > Earth Sciences > Atmospheric Science > Meteorology

Machine Learning

Mathematics and Computing > Computer Science > Artificial Intelligence > Machine Learning

Artificial Intelligence

Mathematics and Computing > Computer Science > Artificial Intelligence

Atmospheric Dynamics

Physical Sciences > Earth and Environmental Sciences > Earth Sciences > Atmospheric Science > Atmospheric Dynamics

npj Climate and Atmospheric Science

npj Climate and Atmospheric Science

This journal is dedicated to publishing research on topics such as climate dynamics and variability, weather and climate prediction, climate change, weather extremes, air pollution, atmospheric chemistry, the hydrological cycle and atmosphere-ocean and -land interactions.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Atmosphere-Biosphere Interactions

This Collection invites original Research articles, as well as Reviews, Perspectives, and Comments, that explore atmosphere-biosphere interactions across various temporal and spatial scales.

Publishing Model: Open Access

Deadline: Oct 31, 2026

Explore this Collection

AI-Driven Innovation in Atmospheric Chemistry and Composition–Climate Interactions

We invite Original Research, Reviews, Perspectives, and Case Studies that examine how AI and data innovation can advance atmospheric chemistry, atmospheric composition research, and chemistry–climate interactions.

Publishing Model: Open Access

Deadline: Oct 31, 2026

Explore this Collection

Latest Content

News and Opinion

Can hospitals improve quality without improving governance?

Lifelines of Civilization - Deltas and Water Bodies

Challenges in securing peer reviewers for reproductive endocrinology manuscript: An Editor view

From Infection Control to Decision Control: The Next Revolution in Public Health

Nitrous oxide controls the DCN to VTA dopamine circuit by enhancing AMPA receptor functions during rewarding behavior

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Good Forecast, Missing Physics: Looking Inside of AI Weather Ensembles

Share this post

Share with...

...or copy the link

Good Forecast, Missing Physics: Looking Inside of AI Weather Ensembles