FAIR Data and Semantic Publishing

Explicit, Executable, Reusable, and Automatically-Disseminated Scientific Publications in the form of FAIR Data

Projections suggest that the delay between scientific discovery, and the dissemination and implementation of the knowledge embodied in that discovery, will soon vanish. At that point, all knowledge resulting from an investigation will be instantly interpreted and disseminated, influencing other researcher's experiments, and their results, immediately and transparently. This clearly requires that research results be of extremely high quality and reliability, and that research processes – from hypothesis to publication – become tightly integrated into the Web. Though the technologies necessary to achieve this kind of “Web Science” do not yet exist, our recently-published studies of automated in silico investigation demonstrate that we are enticingly close, and a path toward next-generation Web Science is now clear.

FAIR Data - Findable, Accessible, Interoperable, and Reusable - is a global initiative to forever change how we capture and publish scholarly data and knowledge.  "Data" is a catch-all phrase that includes all forms of scholarly output, including data, services, workflows, publications, and other "research objects".  These must all be identified, described, and published using formal knowledge representation frameworks such that machines can be empowered to discover, integrate, and even execute/interpret these research objects unaided by humans.  


We propose to dramatically alter the way high-throughput in silico research is done. We synthesize and evaluate Web Science frameworks, investigating the technologies and knowledge-infrastructures necessary in this novel environment to ensure a rigorous scientific process, including debate, accuracy, transparency, reproducibility, and peer-review. Web Science simplifies in silico research for bench scientists by providing an ecosystem of expert knowledge and analytical strategies that can be accurately and automatically assembled. More broadly, it facilitates scientific discourse by enabling researchers to easily see their data through another's eyes, explicitly compare disparate hypotheses to precisely identify differences in opinion, automatically evaluate those hypotheses over novel data-sets to investigate their validity, and integrate the resulting knowledge directly into the community knowledge-pool in the form of “executable publications”. Finally, it enhances scientific rigor, particularly for high-throughput experiments, by helping to eliminate bias, and by improving the documentation and reproducibility of published results.