Geocoding more than 9000 disasters across the globe

A short story of how the GDIS dataset came about.
Published in Research Data
Geocoding more than 9000 disasters across the globe
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

In late-2015 I began my PhD project investigating how rapid-onset disasters might impact ongoing armed conflicts as part of a larger project on the security implications of climate change. Five years and a PhD thesis later, the GDIS dataset on geocoded disaster locations is finally available.

The idea of the dataset came out early in the PhD project. Knowing that the International Disaster Database (EM-DAT) provided a list of all recorded disasters of a certain magnitude across the world, I planned a quantitative assessment of how disasters might influence various conflict dimensions and actors in ongoing armed conflicts. However, I quickly realized that even though in many countries these two phenomena are co-occurring both in time and space, countries are large and conflict actors often operate in delimited areas within a country. I therefore concluded that in order to proceed, I needed to know more about where the disasters occurred within each country.

It became clear that a dataset of subnational disaster locations would be of interest also beyond our immediate research project, and together with my supervisor and project leader, we decided to geocode natural hazard-related disaster events listed in EM-DAT that had occurred after 1960. For the vast majority of the disasters, one or several locations were mentioned in a text column in EM-DAT. To identify and assign geographic information to these places, we relied on data provided by the Global Administrative Boundaries Database (GADM), which provides maps and spatial data for all countries and their subdivisions.

The matching of locations was first done with automated scripts in R, but being based on text – the names of the places that had been affected by the disaster – substantial manual coding was also necessary. In addition to the many instances where spelling (and even language) was different across data sources, some locations would be very specific (like a city neighborhood or village), while others were more diffuse (for example a mountain range or a cultural or ethnic area). With eminent research assistance, we manually went though all observations that did not automatically match in order to establish whether we could credibly place it within an administrative boundary and which boundary that should be. With a candidate list of 11 000 disasters and 47 000 locations this was a time-consuming task, and in the end we identified 39 953 locations for 9 924 disasters.

The geographic information on disaster locations provided by GDIS enables connecting the disasters to virtually any other geographic data source. We hope that our data descriptor in Scientific Data and the dissemination of the dataset through NASA’s Socioeconomic Data and Applications Center (SEDAC) will reach beyond our own immediate research communities, and that the data will be widely used and push new frontiers of research.

Our paper in Scientific Data is available here.

 

References

Guha-Sapir, D., Below, R. & Hoyois, P. EM-DAT: International disaster database. Centre for Research on the Epidemiology of Disasters (CRED) (2014).

GADM. Database of Global Administrative Areas https://gadm.org/data.html (2018).

Rosvold, E. L. & Buhaug, H. Geocoded disaster (GDIS) dataset, 1960-2018. Socioeconomic Data and Applications Center (SEDAC) https://doi.org/10.7927/zz3b-8y61 (2021).

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Research Data
Research Communities > Community > Research Data

Related Collections

With collections, you can get published faster and increase your visibility.

Data for epigenetics research

This Collection presents data within epigenetics research including, but not limited to, data generated through techniques such as ChIP, bisulphite, nanopore and RNA sequencing, single-cell epigenetics/epigenomics, spatial genomics/epigenomics, and the role of non-coding RNAs in epigenetic modulation.

Publishing Model: Open Access

Deadline: Sep 30, 2024

Neuroscience data to understand human behaviour

This Collection presents descriptions of datasets combining brain imaging or neurophysiological data performed alongside real-world tasks or exposure to different stimuli.

Publishing Model: Open Access

Deadline: Oct 31, 2024