Tackling air pollution with large datasets

Accessible and transparent research data is the key to contribute to the solution of global challenges. Where can we as scientists improve?

Published in Sustainability

Feb 14, 2019

Christopher Oberschelp

Researcher, ETH Zurich

Tackling air pollution with large datasets

Like Be the first to like this

Explore the Research

"What knowledge is lacking to act effectively against a surge in global air pollution?" - that is the question that kicked off a research project a while back and, for me personally, lead to a long-lasting struggle revolving around the accessibility and usability of data. With more and more research being done, it also becomes more and more challenging to obtain the "right" data for a project. In the field of global air pollution, there are vast amounts of data available. Finding the relevant data, selecting it, and then integrating it were the main hurdles that had to be overcome. Here, I want to share three of my insights in the process of developing an own dataset on air pollution from global coal power generation.

1. Document data quality - really!

Finding data is often difficult. For our air pollution project, that was not the case. Numerous data sources along the whole coal power supply chain had to be combined so data selection became the true challenge. As the raw data showed overlaps, it was important to know the quality of the data sources. Uncertainties for pollutant measurements for example were often lower than for modeled pollutant releases so it made sense to prioritize less uncertain data. Unfortunately though, documentation of dataset quality was missing in several cases. A full uncertainty assessment may be outside the scope of many studies, but I realized that even a qualitative indication of data quality can be extremely helpful when comparing or combining data from different sources.

2. Provide unneccessary details

During the project, my colleagues and I had to find creative ways to deal with data gaps. Only once our basic air pollution model was set up, it became fully clear how important each data point was for its outcomes. For example, a central parameter was the fuel consumption that allowed to fill gaps where electricity generation was not reported. Fuel consumption data, in turn, was often also unavailable, but it could be back-calculated from reported carbon dioxide emissions due to the carbon balance. This is one case where a single data point per power plant helped to improve our result quality dramatically in an unforeseen way. Benefiting from this kind of proxy data is only possible when researchers publish the detailed data they have collected - even when its usefulness for others as well as the direct relevance for their own research outcomes may not be clear immediately.

3. Share ugly code

When reading articles about power plant models, I struggled to translate model descriptions into code despite clear and transparent documentation, while a few code snippets here and there sped up the model development a lot. And when my colleagues and I had to describe our own model code, we realized how difficult it is compromise between level of detail and understandability in such a case. A good general solution could be to provide the model source code as extended documentation for a paper - even when the code is written in a pragmatic way without the help of a professional programmer. At least in the field of sustainability research, this is still the exception rather than the rule. From my experience, the benefits of sharing code are numerous, though: research becomes more transparent, feedback can help to improve the models and double work can be avoided. Why not give it a try?

Finally, I want to encourage you, the reader, to share your views. Do you agree or disagree with these suggestions? Have you made similar or other experiences? Based on what we have learned during our project, my colleagues and I try to provide a large amount of data in the supporting information of our paper, and we make more data and code available externally. Be invited to have a look and tell us what to improve. Constructive criticism is what makes us learn most.

Title photo: kamilpetran/ iStock

Christopher Oberschelp

Researcher, ETH Zurich

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Sustainability

Research Communities > Community > Sustainability

Nature Sustainability

Nature Sustainability

This journal publishes significant original research from a broad range of natural, social and engineering fields about sustainability, its policy dimensions and possible solutions.

More about the journal

What are SDG Topics?

An introduction to Sustainable Development Goals (SDGs) Topics and their role in highlighting sustainable development research.

Latest Content

Tomorrow’s Table: Food Systems in the Era of Climate Change

Reading oral cancer’s molecular warning signs without a biopsy

Spacetime Curvature Inside a Stationary Volume Completely Enclosed by a Near-Light-Speed Energy Shell: The Börekci Energy Field Apparatus, the Redesigned Börekci Metric and Antimatter Production

Farmers’ questions changed my research agenda: the case of knowledge sources in regenerative agriculture

"Aether-Light" The Fact of Everything or Unification of Physics

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Tackling air pollution with large datasets

Share this post

Share with...

...or copy the link