Data sharing: forced altruism?
Genuine altruism isn’t exactly rife within the biological sciences or anywhere else (even charitable giving can be cast as guilt avoidance). Yet biological scientists have for some time been encouraged or even required to share their data, for no real reward.
The arguments for data sharing — further exploitation, community oversight — are valid, but the effort required and the risk of ceding intellectual property to competitors and others continue to ensure that it is widely seen as an imposition. Particularly irksome for some is the idea that others might unilaterally beneﬁt: one reviewer of a standards paper of which I was an author asserted, in so many words, that bioinformaticians (major data consumers) are parasites. Technically, this is more or less correct. How can that relationship be changed into a mutualism, and how can policy engender that? And more generally, how can scientists be encouraged not just to share, but to do a great job of sharing; annotating and structuring their data using community standards and databases to maximise impact and reach?
A recent RIN meeting on data sharing behaviour and policy heard from three people working to encourage and support data sharing: Andrew Young (Director of Research, JMU) stressed that data sharing should be a normal part of a research plan and that universities have a duty to encourage and support (while noting the creative tension between those encouraging the sharing of university-owned IP and those tasked to maximally exploit it); Carole Goble (School of Computer Science, University of Manchester) demonstrated that quietly-sophisticated access rights management helps (the ‘good fences’ principle), but that the sociological and structural ills run deep; and Kevin Ashley (Director, DCC) described both the inertia and the skills gap in the general science populace, advocating a step change in training and support levels.
All agreed that encouragement trumps compulsion (or should) and that the probably-still-coming Research Excellence Framework presents an excellent opportunity to more explicitly reward data sharing and honour related trades such as curation. All also agreed that current funding programmes do little to ensure durable community-wide support. In summary (peer-rivalry and technical issues aside), credit is the key. How very Darwinian. But shared data were already recognised as an output under the now-defunct RAE (well, strictly speaking the 2008 guidance talks about databases, but let’s not quibble). And while robust, accessible tools and databases that support Digital Object Identiﬁers (for identifying data sets) and ORCIDs (for identifying individuals) are necessary, they are not sufﬁcient.
Simply put: frequency of reuse is the critical metric for engendering ‘good’ rather than grudging (or absent) sharing. Better-written papers (are assumed to) have greater impact; better-annotated data sets should similarly beneﬁt, because good annotation increases discoverability and clarity, and supports more diverse reuse. The challenge then is for authors and referees to ensure correct and visible attribution of data sources in papers, and for faculties and funders to measure that reuse, so that when the bioinformaticians come browsing, all stand to beneﬁt.