So2Sat POP - A Curated Benchmark Data Set for Population Estimation from Space on a Continental Scale

Published in Research Data

Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

About the Article:

The recently published article, "So2Sat POP - A Curated Benchmark Data Set for Population Estimation from Space on a Continental Scale," provides a comprehensive data set for population estimation in 98 European cities. The cities cover 28 European Union (EU) member states and the four EFTA countries. It represents a wide range of topography, demography, and architectural designs across the countries. It would eliminate the need to collect and process a new data set in order to develop and validate the methods. The data set comprises digital elevation models (DEM), local climate zones (LCZ), land use (LU), and nighttime lights (VIIRS) in combination with multi-spectral Sentinel-2 imagery (SEN2) and data from the Open Street Map initiative (OSM). This multi-data source combination has not been explored before in the domain of population estimation. We expect that it will be a valuable addition to the research community for developing sophisticated approaches in the field of population estimation.

About the Methodology:

The preprocessing of all the data used to produce the input data for each city is shown step-by-step in Figure 1. All of the input data has been cropped using our own algorithm's established city borders.

Figure 1: Step-by-step preprocessing of all the input data sources to prepare the corresponding input data for each city.

The input data that was processed in the first step was used to construct the patches in the following step. The odd-numbered class samples from our data set are shown in Figure 2 along with the corresponding patch-set, population class, and population count. The lower classes correspond to areas that are lightly populated. Lower class patches are largely composed of bare ground, water, and green fields. Patches feature sparse low-rise to dense high-rise built-up regions as the class number increases. In other words, lower to higher class patches correspond to rural and urban areas, respectively. 

Figure 2: Sample patches from the odd numbered classes of our data set. Lower classes depicts sparsely populated regions while higher classes depicts densely populated regions.

To demonstrate the potential of our data set, we trained the Random Forest model on our test data set using the extracted features from the input data to estimate the population. The preliminary findings suggest that the So2Sat POP data set presents a feasible opportunity for the development of potent machine learning techniques. 

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Research Data
Research Communities > Community > Research Data

Related Collections

With Collections, you can get published faster and increase your visibility.

Data for crop management

This Scientific Data Collection welcomes submissions of Data Descriptors associated with datasets for crop management, which are essential for optimising agricultural productivity, sustainability, and food security.

Publishing Model: Open Access

Deadline: Jan 17, 2026

Computed Tomography (CT) Datasets

This Scientific Data Collection highlights a series of articles that describe CT imaging datasets.

Publishing Model: Open Access

Deadline: Feb 21, 2026