Where and when there are satellite data gaps due to cloud coverage and what are the implications for forest monitoring?

Near real time deforestation systems rely mostly on optical observations which can be obscured by clouds. We derived datasets for each 30 × 30 m over the tropics to discover where and when data gaps in dense optical time series exist due to clouds and its implications in early change detection.
Where and when there are satellite data gaps due to cloud coverage and what are the implications for forest monitoring?

 We are in a new era of satellite remote sensing.  Never before we have had so many freely available datasets of medium spatial resolution, i.e. 20-30 mts, that provide high-density of optical observations. 

Currently, we have availability from multiple satellite datasets that can provide an observation on any point on Earth almost daily if combined. This includes data from multiple Landsat missions, and Sentinel-2 from Copernicus. In addition, state-of-the-art methodologies to monitor deforestation rely heavily on optical satellite observations. 

The tropics are of particular interest, on one hand they host the areas of most rapid change and on the other it is in these areas where there is more persistent cloud coverage. In fact, according to Pacheco et al 1, five of the six active deforestation fronts are located in the tropics.  This poses the questions, of where and when there are data gaps due to cloud coverage in optical time-series? And what are the implications for monitoring deforestation in these rapidly changing areas?

Therefore, we evaluated the spatial and temporal availability of cloud-free data from the combined time series of Landsat 7, 8, and Sentinel-2, the most common sensors used to monitor deforestation.

 Three datasets were created, a) count, representing the number of cloud-free pixels per year, b) maximum , representing the maximum waiting period, in days, to get a cloud-free observation, and c) date, representing the final date of that waiting period.  The dataset a) count, addresses the spatial distribution of cloud-free data, or the where, and the datasets b) maximum, and c)  date, address the temporal availability of cloud-free data, or the when. 

These datasets were created over a domain of 59.4 Million Km2 in the tropics, covering 168 countries.  Five consecutive years of satellite data from Landsat and Sentinel-2 missions were analyzed , from 2017 to 2021 in Google Earth Engine (GEE).  The three animations below show a first glance of the datasets created.  The units for the spatial dataset, a) count is number of cloud-free observations. The units for the datasets b) maximum, and c)  date, are days and Julian date respectively. 

Spatial availability of cloud-free pixels

Spatial distribution of cloud-free data
Number of cloud-free observations from Landsat 7, 8 and Sentinel-2. This represents the a) count dataset.

Temporal availability of cloud-free pixels

Maximum waiting period in days
Maximum waiting period in days per year to get cloud-free data combining Landsat 7, 8 and Sentinel-2.  This represents the b) maximum dataset.
Date of waiting period
Timing at the quarterly level of the final date for the maximum waiting period to get a cloud-free observation from combined Landsat 7, 8 and Sentinel-2. This represents the c) date dataset. 

Implications for forest monitoring

The datasets created effectively inform where and when there are data gaps due to cloud coverage. As it pertains to tropical forests, for the years 2017 and 2018,there is a total area of 2 Million Km2 and 1.5 Million Km2, respectively, with no cloud-free observations. This is about the area of the country of Mexico. In addition, the median availability of data over the tropical forests for those two years is about 12 cloud-free observations per year. 

Meanwhile, for the years 2019 to 2021 when there is more availability of Sentinel-2 data, the median significantly changes to  about 40 cloud-free observations per year. 

The temporal datasets, as analyzed over key deforestation fronts provide unique insight of when there are data gaps in these critical areas.  The figure below aggregates the maximum waiting period with the date datasets to understand when there is and isn't cloud-free data in these rapid changing areas. The Asia and Oceania deforestation fronts represent the regions with the lowest availability of cloud-free observations and longer waiting periods year around. For example, the average waiting time to get cloud-free data for the Oceania deforestation front is about 120 days, and for the Asia deforestation front is about 100 days. 

This figure showcases the timing of the maximum number of days without a cloud-free observation for each deforestation front. For example, in November along the African deforestation front, the average location will have been waiting for around 100 days for a clear optical image. 

These datasets are novel since they are derived from multiple data sources, at nominal spatial resolution (30 mts) and high temporal resolution; over five years using all images available. They are also unique as they can be used to pinpoint where and when there are data gaps that can be complemented with additional satellite observations, including Synthetic Aperture Radar (SAR) or high-resolution commercial datasets. 

Data availability

All datasets are freely available in GEE. The following code examples in Earth Engine code editor provide an easy access and visualization of the data. For the spatial cloud-free distribution datasets: 
  (https://code.earthengine.google.com/de193db53d27cf1c2ab061e40de8f6bd and for the temporal datasets: https://code.earthengine.google.com/fda2bb4b06f08b6020a541dafe9e2e3d

Due to the large size of the data (~ 1TB) a subset is accessibly and described at Zenodo  (https://doi.org/10.5281/zenodo.7714192) . In addition, a public web interface for rapid exploration and visualization of the data at the country level is available below:

Application to visualize data

To ensure reproducibility, and recreate the results for other regions and/or years the example code for creating and accessing all the spatial and temporal datasets is available in Zenodo at (https://doi.org/10.5281/zenodo.7761963)

While the data is currently available in GEE in the near future they will also be available in additional platforms. Stay tuned! 


1Pacheco, P. Deforestation fronts: Drivers and responses in a changing world. Available at Worldwidelife.org - Deforestation fronts report (2021)

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Subscribe to the Topic

Earth and Environmental Sciences
Physical Sciences > Earth and Environmental Sciences

Related Collections

With collections, you can get published faster and increase your visibility.

Meteorology and hydroclimate observations and models

This Collection presents a series of articles describing hydroclimate datasets, including data sourced from remote sensing, primary measurements or theoretical models. Datasets are presented without analyses in order to support policy development and further research, with Data Descriptors providing full details of data sources, modelling, and any associated code.

Publishing Model: Open Access

Deadline: Dec 15, 2023

Medical imaging data for digital diagnostics

This Collection presents a series of articles describing annotated datasets of medical images and video. All medical specialities are considered and data can be derived from study participants, tissue samples, electronic health records (EHRs) or other sources.

Publishing Model: Open Access

Deadline: Dec 20, 2023