Earlier this year, Lee Dirks, Cheri Ekholm, and I attended Phil Bourne’s Beyond the PDF workshop at the University of California, San Diego. This workshop advanced the premise that scholarly communication can and should evolve from static and disparate data and knowledge representation, as embodied in today’s typical PDF representations of research papers, to a rich integrated content which grows and changes the more we learn. In the few months since this event, there’s been a great deal of activity: Martin Fenner and Mark Hahnel are working on WordPress for Scientists, Peter Sefton has launched the Scholarly HTML effort, and our team here at Microsoft Research is hard at work on some great new features for our next release of the Article Authoring Add-in for Word (the beta release of version 3.0 is due this later this year).
But even more important than developing tools or formats, we need to persuade the current generation of researchers to challenge the status quo. Scholars do see the value (to varying degrees) in sharing a greater level of detail of their research as a part of the scholarly communication process, recognizing that such communications enhance the scientific record and accelerate discovery. But active researchers are also players in a system that rewards the traditional research paper format to the exclusion of all else.
A major thread in this conversation stresses the enormous potential of shared research data in facilitating experimental reproducibility and validation. Much ink has been spent in the past few years on the “data deluge” and the promise of new advances in science that are not based on the traditional hypothesis-experiment-analysis-conclusion paradigm, but rather start with previously unseen patterns, anomalies, or correlations within the existing wealth of collected data themselves as the catalyst for new investigations and experiments.
To this end, it is clear that the sharing of scientific research data holds great promise for the scientific discoveries of the future. And yet the system of academic research achievement does not yet recognize or reward researchers for sharing their data. Change is afoot, however, and the next generation will look back on this decade as one of profound transformation in determining which parts of the scientific research process are recorded and how researchers are rewarded.
In the meantime, organizations like BioMed Central are stepping up to recognize those researchers who are on the vanguard of this movement. Microsoft Research has contributed to BioMed Central’s Research Awards for several years, and we are proud sponsors of the Open Data Award since its inception in 2010. This year’s award recognized biologist Tommi Nyman from Finland for the article, “How common is ecological speciation in plant-feeding insects? A ‘Higher’ Nematinae perspective,” published in the open-access journal, BMC Evolutionary Biology.
Dr. Nyman and his colleagues published three additional data files with their article:
- The collection data for their specimens and taxonomic and ecological background information
- The sequence data used in phylogeny reconstruction and resultant phylogenetic trees
- The data file and run parameters for Bayesian Evolutionary Analysis Sampling Trees (BEAST)
The data are well labeled and readily understandable by other scientists; moreover, the authors showed great transparency in their work, particularly in their first additional data file, which fully documents how they sampled their insects. This level of openness is not commonly seen and it demonstrates real leadership. Their article serves as an outstanding example of how evolutionary biology research should be presented and the data published to enable other scientists to validate and build on the work.
Microsoft Research is honored to be an ongoing sponsor of the Open Data Award, and we are thrilled to be able to play a role in encouraging the Open Data movement in this way.
—Alex D. Wade, Director for Scholarly Communication, Microsoft Research Connections