Behind the Paper

Keeping data alive for the long term - a case study

This post relates to the paper 'Re-curating a 90-year ecological data collection from an arid zone enclosure according to FAIR principles.' DOI: 10.1038/s41597-026-07556-x

Published in Earth & Environment, Ecology & Evolution, and Research Data

Jun 26, 2026

Alison Specht

Ecosystem Research Analyst, TERN, University of Queensland

Liked by Yijia Li

Explore the Research

As an ecologist I have always been concerned about understanding processes: why do things happen? Why do some species survive and some not? Why do some animals or plants respond well to a disturbance and some not? How do things work? These questions and more have been fundamental for ecologists and natural historians over centuries. All of them depend on data, data about an object of study, and data about the factors that may affect it. As we go further into a digital world, the need to have access to digital records has increased, along with our ability to use those data. Basic to this is probity: the ability to see the data that was used in an interpretation of events. Does someone else agree with our understanding? A decision made from this understanding? Ensuring long-term data are available and accessible is not, however, a simple matter.

A data story, from creation to publication

The case study described in this paper is an example of what may be required elsewhere. We describe the process involved in ensuring a 90-year collection of long-term data continues to be available for machine discovery. The site on which the data were collected is nationally and internationally renowned for its longevity. These are not data that are retrospectively 'discovered' from an ice core, or pollen analysis from a bog, or collected by instruments such as electronic monitoring stations, but these are data that were:

collected at a remote location in the semi-arid zone in Australia established in 1926 by staff of the Adelaide University (henceforth 'the field site'),
physically collected by researchers and students annually,
recorded on clip boards in the field, annotated, transcribed, and stored in filing cabinets,
digitised in the 1990s into 'new' database systems on desktop computers, and
uploaded in 2014 into a newly established open-access domain repository (the repository of Australia's Terrestrial Ecosystem Research Network, TERN).

This had involved a cast of hundreds across the years.

Making the data usable - the challenge

The data were stored safely in 2014, but were they usable? Not really.

To update the data we had to delve into the records, obtain missing records, and ensure the data were described in a way that would allow machine discoverability.

Time was of the essence due to the recent death of the last custodian, without whom there would have been no data to update. First, we had to explain the need to the people and the institution (Adelaide University) associated with the field site. Then we assembled a part-time multi-disciplinary team in TERN, consisting of specialists in both data and in environmental science. The lead author had data science expertise and familiarity with the site and provided a bridge between the field site custodians and repository staff, and took primary responsibility for data set quality. Data and environmental scientists of varying hues participated in cleaning the data, uploading new data, ensuring standardised vocabularies and metadata were adopted, and the data published in a discoverable and citable manner. This process took more than two years, exacerbated by the lack of the custodian who could explain the data in detail.

Towards the future

From the work done, it is clear it is entirely possible to update old records to adhere with modern standards, but only if there is sufficient effort and commitment made by the repository and a dataset 'custodian' is involved. The fact that the site was very well known in the environmental science community was one reason this attempt was made. Much of the effort expended was in understanding the methodology and rationale that created the original, 2014, datasets.

We expect that re-curation for reusability of any dataset—be it a long time series or short-term measurements intended to be kept for the long term—will need to be repeated periodically, so feel that this study is an object lesson in what may be required.

Recommendations

Think carefully about the usefulness of the data. When submitting data for preservation and reuse, one must think very carefully about the re-users. What do they need to know? What detail, if missing, will make the data unusable or not trustable? If data are not properly described and formatted, they may as well not exist.
Consistency. For long term value, a time series data collection needs to be consistent, and here we are talking decades. Few scientists are keen on repeating other's work, as they get few rewards for doing so, so this must be explicitly valued.
Include data entry in the project. A field trip to a remote, exotic location may be a great experience, and the data may be well collected, but without high-quality digital records (backed up, maybe even by paper!) you may as well not have collected the data.

TGB Osborn workflow — TGB Osborn VR data re-curation workflow

Alison Specht (She/Her)

Ecosystem Research Analyst, TERN, University of Queensland

Please sign in or register for FREE

If you are a registered user on Research Communities by Springer Nature, please sign in

Follow the Topic

Research Data

Research Communities > Community > Research Data

Environmental Sciences

Physical Sciences > Earth and Environmental Sciences > Environmental Sciences

Sustainability

Life Sciences > Biological Sciences > Ecology > Ecological Modelling > Sustainability

Scientific Data

Scientific Data

A peer-reviewed, open-access journal for descriptions of datasets, and research that advances the sharing and reuse of scientific data.

More about the journal

Related Collections

With Collections, you can get published faster and increase your visibility.

Computer vision in plant science and agriculture

This Scientific Data Collection invites Data Descriptors documenting the generation, curation, and validation of datasets that underpin computer vision applications across plant biology, crop science, and agricultural systems.

Publishing Model: Open Access

Deadline: Oct 10, 2026

Explore this Collection

Wearable and Computer Vision Data for Health and Behaviour Research

This Scientific Data collection of articles focuses on data from wearable and non-wearable devices, including data from devices that monitor health and computer vision data.

Publishing Model: Open Access

Deadline: Aug 08, 2026

Explore this Collection

Latest Content

The Equation-Free Journey: From Multiscale Computation to Neural Operators

Behind the Paper, From the Editors

JMCR: Clinical Reasoning From Case Reports

Highlights of the BMC Series – June 2026

Behind the Paper

Robust evidence for theta-band rhythmicity in behavior across two dense-sampling datasets

Behind the Paper

Iron for two: a fungal siderophore promotes the growth of its algal symbiont

Cookies

We use cookies to ensure the functionality of our website, to personalize content and advertising, to provide social media features, and to analyze our traffic. If you allow us to do so, we also inform our social media, advertising and analysis partners about your use of our website. You can decide for yourself which categories you want to deny or allow. Please note that based on your settings not all functionalities of the site are available.

Further information can be found in our privacy policy.

Keeping data alive for the long term - a case study

Share this post

Share with...

...or copy the link