Data sharing in structural biology: Advances and challenges

M. Grabowski, I. G. Shabalin, P. J. Porebski, M. J. Domagalski, H. Zheng, D. R. Cooper, B. S. Venkataramany, P. E. Bourne, W. Minor

Research output: Chapter in Book/Report/Conference proceedingChapterpeer-review

1 Scopus citations

Abstract

The recently revealed reproducibility crisis in biomedical research and the emergence of "Big Data" brought an elevated recognition of the value of data sharing and open science principles. The preservation and public availability of primary data and metadata are necessary conditions for scientific results obtained by one research group to be verified and reproduced by other research groups. In recent years, many public funding agencies throughout the world have adopted open access policies for data resulting from government-sponsored research. Some publishers have done the same, insisting that data pertaining to publications are publicly available. Simultaneously, new technological solutions allowing effective and efficient data-sharing schemes have emerged and evolved. The field of structural biology has embraced sharing research data since the outset. Early milestones include the establishment of the Cambridge Structural Database (CSD) as a repository of small-molecule structural data in 1965 and the Protein Data Bank (PDB) for macromolecular structures in 1971. However, the raw diffraction images collected for X-ray crystallography, the dominant method of macromolecular structure determination, usually have not been shared. In the past, these data have, in fact, been routinely discarded after structure deposition or even immediately after data reduction; regrettably, such practices can sometimes still be encountered today. Until recently, protein crystallographers did not have an easy way to share the primary data (raw diffraction images) with the wider research community. This changed recently with the appearance of several specialized public repositories for X-ray diffraction data, such as the Integrated Resource for Reproducibility in Macromolecular Crystallography (IRRMC) or SBGrid Data Bank, in addition to all-purpose data repositories. Collectively, the existing public repositories of raw data for macromolecular crystallography have amassed over 7,000 datasets for about 4,000 macromolecular structures. The primary data for the vast majority of the crystallographic structures deposited in the PDB, however, are not available, and even if these data exist somewhere, they will likely never be located. Although the stream of data depositions to the specialized repositories is increasing, sharing X-ray diffraction data remains limited to a small fraction of structures being currently determined. Nevertheless, these repositories have already proven their worth by allowing identification of problems in data collection protocols as well as correction of problematic crystal structures and thus have bolstered the reproducibility of biomedical experiments.

Original languageEnglish
Title of host publicationData Sharing
Subtitle of host publicationRecent Progress and Remaining Challenges
PublisherNova Science Publishers, Inc.
Pages29-68
Number of pages40
ISBN (Electronic)9781536146783
ISBN (Print)9781536146776
StatePublished - 28 Dec 2018
Externally publishedYes

Keywords

  • Data sharing
  • Protein crystallography
  • Structural biology
  • X-ray diffraction data

Fingerprint

Dive into the research topics of 'Data sharing in structural biology: Advances and challenges'. Together they form a unique fingerprint.

Cite this