When Big Data Isn’t Enough: The Challenge of Rain-Snow Partitioning

Our study started with the simple assumption that machine learning would consistently outclass traditional modeling approaches for predicting rain and snow. This proved to be wrong, surprising our project team and raising implications about the future of rain-snow partitioning research.
When Big Data Isn’t Enough: The Challenge of Rain-Snow Partitioning
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Nature Communications published our new article today: Machine learning shows a limit to rain-snow partitioning accuracy when using near-surface meteorology.

Like many other critical earth system variables, there are few direct observations of precipitation phase. This means we often do not know whether it is raining or snowing in a given location, a challenge that is particularly acute in mountain regions at air temperatures near freezing.

The reason for this is three-fold. One, most continuous observations of precipitation phase in the US come from present weather sensors at airports, which are generally situated in low-lying, valley areas. Two, the lack of observations means scientists, forecasters, and operations professionals often rely on ancillary surface meteorological measurements, such as air temperature and relative humidity, to determine precipitation phase. And three, all traditional methods using surface meteorology to partition precipitation into rain, snow, and other phases struggle between approximately 0°C and 4°C.

To help remedy these issues, our group launched a participatory science project five years ago called Mountain Rain or Snow with funding from the NASA Citizen Science for Earth Systems Program and the Nevada NASA Established Program to Stimulate Competitive Research. The premise was straightforward: engage volunteers to crowdsource visual reports of rain, snow, and mixed precipitation in major US mountain ranges with a simple smartphone app. Our hope was to improve modeled estimates of rain-snow partitioning through the creation of regionally optimized rain-snow temperature thresholds. 

A Mountain Rain or Snow volunteer uses the smartphone app to submit a snow observation.
A Mountain Rain or Snow volunteer submits a snowfall observation using the smartphone app. The app automatically geotags and timestamps each crowdsourced report. Credit: Jennings.

Yet, no matter how much data we gathered from different mountain ranges (nearly 100,000 observations and counting), we kept running into the same problem. While surface meteorological measurements accurately predict rain and snow at warm and cool temperatures, respectively, they cannot reliably partition solid and liquid precipitation at temperatures near and above freezing. Not even regionally optimized methods could get rid of this performance dip.

With our growing dataset in mind, we decided to move on from traditional rain-snow partitioning methods and instead deploy machine learning techniques. To supplement the Mountain Rain or Snow data, we identified an additional dataset of precipitation phase with nearly 18 million observations to use in our research. That gave us two novel datasets to work with, one with crowdsourced visual observations and one with synoptic weather reports.

On the surface, machine learning is well suited to precipitation phase partitioning. The models can learn emergent, non-linear patterns in the data without prescribed relationships. We had two big, rich datasets to use for training and validation. And the machine learning models could incorporate the full panel of surface meteorological data in each dataset, while most traditional methods only use one or two variables at a time.

We trained three machine learning models in our study: the tree-based methods random forest and XGBoost, and a multilayer perceptron, a type of artificial neural network (ANN). We compared these to a set of traditional methods that we used as benchmarks: air, wet bulb, and dew point temperature thresholds and a statistical model. We trained and tuned the machine learning models on training splits of the crowdsourced and synoptic datasets. We then evaluated performance on the testing splits for both the benchmarks and the machine learning methods to ensure like-to-like comparisons.

Study workflow
The workflow of our recently published study. We start with complete datasets (a), which include observations of rain (R), snow (S), and mixed precipitation (M) along with air (Ta), wet bulb (Tw), and dew point (Td) temperature, relative humidity (RH), and pressure (P). We split these datasets into training (b) and testing (c). We tune the hyperparameters (d) and fit (e) the random forest (RF), XGBoost (XG), and artificial neural network (ANN) machine learning models (f) using the training data. We then apply the machine learning models and the benchmarks (g, Table 1) to the testing data to validate their precipitation phase predictions (h). Credit: Jennings et al. (2025).

To our surprise, machine learning models provided only negligible performance increases. The best benchmark, a wet bulb temperature threshold of 0.5°C, had an accuracy of 93.1% in the synoptic dataset. Random forest, the most accurate machine learning method, had an accuracy of 93.7%, a mere 0.6% improvement.

When looking at accuracy by air temperature, we found larger relative improvements thanks to the machine learning models, but they still struggled to accurately predict rain and snow between 0°C and 4°C. Just like the benchmarks, random forest, XGBoost, and the ANN could not accurately predict rain falling at subfreezing temperatures nor could they capture snowfall at warmer-than-expected temperatures. They also performed abysmally at predicting mixed precipitation.

So, what gives? Why did the machine learning models not advance the state of rain-snow partitioning performance? We found the answer to be relatively straightforward. Meteorological conditions for rain and snow are nearly identical at air temperatures near freezing, an information limitation that meant the more complex machine learning techniques could not find emergent patterns in the data. Such relationships were occluded by surface meteorological similarity.

What’s more, we found the degree of distribution overlap was a strong predictor of rain-snow partitioning accuracy. If there was more overlap at a given air temperature, then accuracy was worse.

Figure showing the relationship between rain-snow distribution overlap and model accuracy
This figure from the newly published paper shows how the overlap of rain-snow air temperature distribution relates to precipitation phase partitioning method accuracy. Panel (a) displays the average partitioning accuracy for all benchmark and machine learning methods and panel (b) shows the overlap of the rain and snow distributions, both by air temperature. Panel (c) then relates the overlap to average accuracy, revealing that the more the distributions overlap, the less accurate the methods become. Credit: Jennings et al. (2025).

The upshot of this work is that the data show there is likely little chance of improving rain-snow partitioning performance using surface meteorological data alone. We encourage other researchers to pursue the integration of novel observations through data assimilation or through the development of additional machine learning approaches that incorporate other spatiotemporal dimensions of data instead of just surface meteorology from the time and location of interest.

To learn more about this work, please view the paper online at Nature Communications.

Jennings, K.S., Collins, M., Hatchett, B.J.Heggli, A., Hur, N., Tonino, S., Nolin, A.W., Yu, G., Zhang, W., and Arienzo, M.M. Machine learning shows a limit to rain-snow partitioning accuracy when using near-surface meteorology. Nat Commun 16, 2929 (2025). https://doi.org/10.1038/s41467-025-58234-2

If you would like to contribute observations of rain, snow, and mixed precipitation to our growing crowdsourced dataset, please consider joining Mountain Rain or Snow by visiting our project website at RainOrSnow.org

Heavy snow at a ski area
Heavy snowfall, even at temperatures near freezing, is a boon to ski areas and water resources. Help our team track precipitation phase in mountain regions by joining Mountain Rain or Snow today. Credit: Jennings

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Earth Sciences
Physical Sciences > Earth and Environmental Sciences > Earth Sciences
Water
Physical Sciences > Earth and Environmental Sciences > Environmental Sciences > Water
Machine Learning
Mathematics and Computing > Computer Science > Artificial Intelligence > Machine Learning

Related Collections

With collections, you can get published faster and increase your visibility.

Applications of Artificial Intelligence in Cancer

In this cross-journal collection between Nature Communications, npj Digital Medicine, npj Precision Oncology, Communications Medicine, Communications Biology, and Scientific Reports, we invite submissions with a focus on artificial intelligence in cancer.

Publishing Model: Open Access

Deadline: Jun 30, 2025

Biology of rare genetic disorders

This cross-journal Collection between Nature Communications, Communications Biology, npj Genomic Medicine and Scientific Reports brings together research articles that provide new insights into the biology of rare genetic disorders, also known as Mendelian or monogenic disorders.

Publishing Model: Open Access

Deadline: Apr 30, 2025