Keeping data alive for the long term - a case study
Published in Earth & Environment, Ecology & Evolution, and Research Data
As an ecologist I have always been concerned about understanding processes: why do things happen? Why do some species survive and some not? Why do some animals or plants respond well to a disturbance and some not? How do things work? These questions and more have been fundamental for ecologists and natural historians over centuries. All of them depend on data, data about an object of study, and data about the factors that may affect it. As we go further into a digital world, the need to have access to digital records has increased, along with our ability to use those data. Basic to this is probity: the ability to see the data that was used in an interpretation of events. Does someone else agree with our understanding? A decision made from this understanding? Ensuring long-term data are available and accessible is not, however, a simple matter.
A data story, from creation to publication
The case study described in this paper is an example of what may be required elsewhere. We describe the process involved in ensuring a 90-year collection of long-term data continues to be available for machine discovery. The site on which the data were collected is nationally and internationally renowned for its longevity. These are not data that are retrospectively 'discovered' from an ice core, or pollen analysis from a bog, or collected by instruments such as electronic monitoring stations, but these are data that were:
- collected at a remote location in the semi-arid zone in Australia established in 1926 by staff of the Adelaide University (henceforth 'the field site'),
- physically collected by researchers and students annually,
- recorded on clip boards in the field, annotated, transcribed, and stored in filing cabinets,
- digitised in the 1990s into 'new' database systems on desktop computers, and
- uploaded in 2014 into a newly established open-access domain repository (the repository of Australia's Terrestrial Ecosystem Research Network, TERN).
This had involved a cast of hundreds across the years.
Making the data usable - the challenge
The data were stored safely in 2014, but were they usable? Not really.
To update the data we had to delve into the records, obtain missing records, and ensure the data were described in a way that would allow machine discoverability.
Time was of the essence due to the recent death of the last custodian, without whom there would have been no data to update. First, we had to explain the need to the people and the institution (Adelaide University) associated with the field site. Then we assembled a part-time multi-disciplinary team in TERN, consisting of specialists in both data and in environmental science. The lead author had data science expertise and familiarity with the site and provided a bridge between the field site custodians and repository staff, and took primary responsibility for data set quality. Data and environmental scientists of varying hues participated in cleaning the data, uploading new data, ensuring standardised vocabularies and metadata were adopted, and the data published in a discoverable and citable manner. This process took more than two years, exacerbated by the lack of the custodian who could explain the data in detail.
Towards the future
From the work done, it is clear it is entirely possible to update old records to adhere with modern standards, but only if there is sufficient effort and commitment made by the repository and a dataset 'custodian' is involved. The fact that the site was very well known in the environmental science community was one reason this attempt was made. Much of the effort expended was in understanding the methodology and rationale that created the original, 2014, datasets.
We expect that re-curation for reusability of any dataset—be it a long time series or short-term measurements intended to be kept for the long term—will need to be repeated periodically, so feel that this study is an object lesson in what may be required.
Recommendations
- Think carefully about the usefulness of the data. When submitting data for preservation and reuse, one must think very carefully about the re-users. What do they need to know? What detail, if missing, will make the data unusable or not trustable? If data are not properly described and formatted, they may as well not exist.
- Consistency. For long term value, a time series data collection needs to be consistent, and here we are talking decades. Few scientists are keen on repeating other's work, as they get few rewards for doing so, so this must be explicitly valued.
- Include data entry in the project. A field trip to a remote, exotic location may be a great experience, and the data may be well collected, but without high-quality digital records (backed up, maybe even by paper!) you may as well not have collected the data.
Follow the Topic
-
Scientific Data
A peer-reviewed, open-access journal for descriptions of datasets, and research that advances the sharing and reuse of scientific data.
Related Collections
With Collections, you can get published faster and increase your visibility.
Genomics in freshwater and marine science
Publishing Model: Open Access
Deadline: Jul 23, 2026
Computer vision in plant science and agriculture
Publishing Model: Open Access
Deadline: Jul 10, 2026
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in