When we added an integration with the figshare repository to seven Springer Nature journals in 2022, we aimed to make data deposition easier for authors and thereby improve how the data behind articles are shared. You could think of this as a hypothesis and an intervention in the existing journal workflow, but what are the appropriate measures of success?
The big picture is that we are looking to solve challenges such as a lack of data available from published papers (exemplified by the ongoing issue of data ‘available on request’). Our approach has been to make the better options, i.e. repositories, more accessible to authors. However measuring the impact of interventions on published papers can take a long time, so a range of metrics were needed over the course of the project.
Early on, we relied on straightforward usage figures as a sign of author appetite for a tool like this. Strong performance in these metrics (15% of submissions and 10% of accepts using the integration to date) has driven rollout to 37 Nature journals so far, including Nature and Nature Communications. Now that we have around a year's worth of published articles since the pilot began in April 2022, we can examine impact in a more meaningful way than the uptake figures allow.
One method we are using to measure this is to categorise each article’s Data Availability Statement (DAS) e.g. ‘available on request’, ‘data in a repository’ etc. and look at sharing patterns before and after implementation. This is also something we can do for a range of interventions, including policy changes and editorial checks. While comparison of two time periods is nowhere near as robust as a control test, it does allow us to look at changing patterns of author behaviour.
The good news, as I reported at this year’s ALPSP Conference session on Open Science metrics (recording available here), is that across the pilot Nature journals the integration period correlates with a significantly higher rate of data sharing via repositories (62% over the implementation period compared to 50% in the final year before implementation). Sharing is also higher in generalist repositories specifically, while we have also seen a reverse in previously rising trends of sharing in the manuscript/Supplementary Information and other locations such as lab websites.
(note the separate the pilot and pre-pilot periods, with a 3 month gap Jan-Mar 2022)
There has been less sharing of data ‘on request’ during the integration period overall, but there was a clear trend of this decreasing anyway prior to integration, which was very likely the result of editorial checks introduced in 2018/19 which challenge this method of sharing. Note that this classification is non-exclusive, any one article may have multiple data sharing types, as is fairly common in Nature Portfolio, where ‘available on request’ is now rarely used on its own.
But we have also seen clear variation between journals, with some very high repository sharing rates and almost no data available on request, while others have seen less benefit. It is clear that adding figshare is not a panacea for data sharing, but a tool that can benefit certain authors and journals in certain ways. It is also important to note that the workflow has been carefully constructed to avoid data types being added to figshare when there is a clear community expectation for them to be in specialist data repositories. We are keen to build in more learnings, and use this integration as a platform for further development and potentially, more advanced data solutions.
The data underlying this analysis are available in figshare at: https://dx.doi.org/10.6084/m9.figshare.24848058
Please sign in or register for FREE
If you are a registered user on Research Communities by Springer Nature, please sign in