Keeping data alive for the long term - a case study

This post relates to the paper 'Re-curating a 90-year ecological data collection from an arid zone enclosure according to FAIR principles.' DOI: 10.1038/s41597-026-07556-x
Like

Share this post

Choose a social network to share with, or copy the URL to share elsewhere

This is a representation of how your post may appear on social media. The actual post will vary between social networks

As an ecologist I have always been concerned about understanding processes: why do things happen? Why do some species survive and some not? Why do some animals or plants respond well to a disturbance and some not? How do things work? These questions and more have been fundamental for ecologists and natural historians over centuries. All of them depend on data, data about an object of study, and data about the factors that may affect it. As we go further into a digital world, the need to have access to digital records has increased, along with our ability to use those data. Basic to this is probity: the ability to see the data that was used in an interpretation of events. Does someone else agree with our understanding? A decision made from this understanding? Ensuring long-term data are available and accessible is not, however, a simple matter.

A data story, from creation to publication

The case study described in this paper is an example of what may be required elsewhere. We describe the process involved in ensuring a 90-year collection of long-term data continues to be available for machine discovery. The site on which the data were collected is nationally and internationally renowned for its longevity. These are not data that are retrospectively 'discovered' from an ice core, or pollen analysis from a bog, or collected by instruments such as electronic monitoring stations, but these are data that were:

  1. collected at a remote location in the semi-arid zone in Australia established in 1926 by staff of the Adelaide University (henceforth 'the field site'),
  2. physically collected by researchers and students annually,
  3. recorded on clip boards in the field, annotated, transcribed, and stored in filing cabinets,
  4. digitised in the 1990s into 'new' database systems on desktop computers, and
  5. uploaded in 2014 into a newly established open-access domain repository (the repository of Australia's Terrestrial Ecosystem Research Network, TERN).

This had involved a cast of hundreds across the years.

Making the data usable - the challenge

The data were stored safely in 2014, but were they usable? Not really.

To update the data we had to delve into the records, obtain missing records, and ensure the data were described in a way that would allow machine discoverability.

Time was of the essence due to the recent death of the last custodian, without whom there would have been no data to update. First, we had to explain the need to the people and the institution (Adelaide University) associated with the field site. Then we assembled a part-time multi-disciplinary team in TERN, consisting of specialists in both data and in environmental science. The lead author had data science expertise and familiarity with the site and provided a bridge between the field site custodians and repository staff, and took primary responsibility for data set quality. Data and environmental scientists of varying hues participated in cleaning the data, uploading new data, ensuring standardised vocabularies and metadata were adopted, and the data published in a discoverable and citable manner. This process took more than two years, exacerbated by the lack of the custodian who could explain the data in detail. 

Towards the future

From the work done, it is clear it is entirely possible to update old records to adhere with modern standards, but only if there is sufficient effort and commitment made by the repository and a dataset 'custodian' is involved. The fact that the site was very well known in the environmental science community was one reason this attempt was made. Much of the effort expended was in understanding the methodology and rationale that created the original, 2014, datasets.

We expect that re-curation for reusability of any dataset—be it a long time series or short-term measurements intended to be kept for the long term—will need to be repeated periodically, so feel that this study is an object lesson in what may be required.

Recommendations

  1. Think carefully about the usefulness of the data. When submitting data for preservation and reuse, one must think very carefully about the re-users. What do they need to know? What detail, if missing, will make the data unusable or not trustable? If data are not properly described and formatted, they may as well not exist.
  2. Consistency. For long term value, a time series data collection needs to be consistent, and here we are talking decades. Few scientists are keen on repeating other's work, as they get few rewards for doing so, so this must be explicitly valued.
  3. Include data entry in the project. A field trip to a remote, exotic location may be a great experience, and the data may be well collected, but without high-quality digital records (backed up, maybe even by paper!) you may as well not have collected the data.
TGB Osborn workflow
TGB Osborn VR data re-curation workflow

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Research Data
Research Communities > Community > Research Data
Environmental Sciences
Physical Sciences > Earth and Environmental Sciences > Environmental Sciences
Sustainability
Life Sciences > Biological Sciences > Ecology > Ecological Modelling > Sustainability

Related Collections

With Collections, you can get published faster and increase your visibility.

Genomics in freshwater and marine science

This Scientific Data collection of articles focuses on transcriptomic datasets and genome assemblies from freshwater and marine taxa.

Publishing Model: Open Access

Deadline: Jul 23, 2026

Computer vision in plant science and agriculture

This Scientific Data Collection invites Data Descriptors documenting the generation, curation, and validation of datasets that underpin computer vision applications across plant biology, crop science, and agricultural systems.

Publishing Model: Open Access

Deadline: Jul 10, 2026