AidData launches new Geospatial Global Chinese Development Finance Dataset

The dataset covers 9,000+ projects worth $830 billion and accompanies the publication of an article in one of Nature’s journals, Scientific Data.
AidData launches new Geospatial Global Chinese Development Finance Dataset
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

Today, a team of AidData researchers launched a new dataset containing geospatial features of 9,405 projects across 148 low- and middle-income countries supported by Chinese grant and loan commitments worth more than $830 billion. The release of AidData’s Geospatial Global Chinese Development Finance Dataset, Version 3.0 (“Geo-GCDF v3”) accompanies the simultaneous publication of an article in one of Nature’s prestigious online journals, Scientific Data

The dataset is a companion to AidData’s Global Chinese Development Finance Dataset, Version 3.0 (“GCDF v3”) released in November 2023. “Our new dataset provides unprecedented insight into the precise locations of Chinese-financed projects around the world,” said Seth Goodman, AidData Research Scientist and lead author of the Scientific Data article. “By making our data freely and publicly available, researchers worldwide will be better able to measure the localized effects of China’s overseas development projects across a range of sectors and issues, including agricultural productivity, household welfare, economic growth, nutrition, infant mortality, environmental degradation, corruption, gender equality, civic engagement, and violent conflict, to name just a few.”

Nature’s Scientific Data is a peer-reviewed, open-access journal for descriptions of scientifically-important datasets and research that advances the sharing and reuse of scientific data. It is highly influential, with some 6.4 million articles downloaded in 2023 alone.

The new Geo-GCDF v3 dataset includes precise spatial definitions of 6,266 projects representing the exact physical features of roads, railways, transmission lines, buildings, and other infrastructure. Precisely geocoded projects can include the routes tracing the paths of roads, railways, and transmission lines, or the outlines and footprints of buildings associated with dams, bridges, mining operations, hospitals, stadiums, and more.

“The combination of Geo-GCDF v3 and GCDF v3 is a powerful one,” said AidData’s Executive Director Bradley C. Parks, “because it creates opportunities to address new questions and revisit old ones about the intended and unintended impacts of China’s grant- and loan-financed projects in the developing world. For example, in previous versions of the GCDF dataset, all of the provinces or districts that intersected with the route of a road project may have been identified, but not the precise route between the road’s start point and end point. This measurement imprecision limited opportunities for rigorous impact evaluation, so we invested considerable effort to build a geospatial version of GCDF v3 that provides spatial information on more than 9,000 projects which have physical footprints or involve specific locations, by extracting point, polygon, and line vector data.”

Five AidData scientists and analysts—Goodman, Sheng Zhang, Ammar Malik, Parks, and Jacob Hall—wrote the paper, with a total of 24 AidData faculty, staff and research assistants spending approximately 2,400 hours assembling the dataset. The methodology, dataset, and the code used to construct the dataset have been made publicly available through GitHub to facilitate replication and future applications.

How’d they do it?

“The initial step of our data collection process involves identifying the subset of Chinese grant- and loan-financed projects that (a) support the construction, rehabilitation, upgrading, maintenance, expansion, or preservation of physical assets with identifiable geographical features, and/or (b) support activities which take place at specific locations with identifiable geographical features,” said Ammar Malik, a Senior Research Scientist and AidData’s Director of Tracking Underreported Financial Flows. 

The purpose of this step is to identify the ultimate geographical destinations of Chinese aid and credit. Examples of (a) include roads, railways, airports, seaports, power plants, electricity transmission lines, industrial parks, schools, hospitals, stadiums, and museums, while (b) might include medical teams stationed at a given hospital or equipment given to park rangers to patrol.  Projects with no geospatial information available through project documentation or other sources—or projects without specific locational destinations—are not processed in the geospatial data collection process. 

“To compile project locations,” said Sheng Zhang, Research Analyst, “we leverage geospatial features defined by OpenStreetMap (OSM), which is a free, editable geographic database of the world built by community collaborators. In addition to utilizing existing features from the extensive data available in OSM, we contribute updates or additions for features that reflect project activities when we are able to do so.”

The new dataset is also included in AidData’s free geospatial data platform, GeoQuery, a project led by Goodman and Hall. “The data in Geo-GCDF v3 can be integrated with hundreds of additional geospatial variables (e.g., land cover, nighttime lights, population density), which allows for time series analysis of trends like deforestation and economic activity,” said Jacob Hall, Data Analyst. “Prior to GeoQuery, similar kinds of analysis would have required extensive GIS knowledge, data processing, and computational resources to prepare the associated datasets from satellite imagery and other sources.”

The Geo-GCDF v3 dataset is compatible with most software and tools that support standard geospatial data formats. Desktop GIS software, such as the open-source QGIS platform or ESRI’s ArcGIS Pro, support a broad range of mapping, analysis, and other applications. Web-based platforms, such as Mapbox and ArcGIS Online, are also useful for sharing and visualizing outputs.

This was originally posted on AidData's  The First Tranche blog. Blog credit: Alex Wooley. Image credit: Sarina Patterson. Link: https://www.aiddata.org/blog/aiddata-launches-geospatial-global-chinese-development-finance-dataset

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Research Data
Research Communities > Community > Research Data
Development Finance
Humanities and Social Sciences > Economics > Economic Development, Innovation and Growth > Development Economics > Development Finance
Methodology of Data Collection and Processing
Mathematics and Computing > Statistics > Methodology of Data Collection and Processing
Data Science
Humanities and Social Sciences > Society > Science and Technology Studies > Information and Communication Technologies (ICT) > Data Science

Related Collections

With collections, you can get published faster and increase your visibility.

Epidemiological data

This Collection presents a series of articles describing epidemiological datasets spanning diverse populations, ecosystems, and disease contexts. Data are presented without hypotheses or significant analyses, and can be derived from population surveys, health registries, electronic health records, field sampling, or other sources.

Publishing Model: Open Access

Deadline: Mar 27, 2025

Data for epigenetics research

This Collection presents data within epigenetics research including, but not limited to, data generated through techniques such as ChIP, bisulphite, nanopore and RNA sequencing, single-cell epigenetics/epigenomics, spatial genomics/epigenomics, and the role of non-coding RNAs in epigenetic modulation.

Publishing Model: Open Access

Deadline: Mar 28, 2025