TY - JOUR
T1 - UKBTools
T2 - An R package to manage and query UK Biobank data
AU - Hanscombe, Ken B.
AU - Coleman, Jonathan R.I.
AU - Traylor, Matthew
AU - Lewis, Cathryn M.
N1 - Publisher Copyright:
© 2019 Hanscombe et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2019/5
Y1 - 2019/5
N2 - Introduction The UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names. Results ukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata. Conclusion Having a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.
AB - Introduction The UK Biobank (UKB) is a resource that includes detailed health-related data on about 500,000 individuals and is available to the research community. However, several obstacles limit immediate analysis of the data: data files vary in format, may be very large, and have numerical codes for column names. Results ukbtools removes all the upfront data wrangling required to get a single dataset for statistical analysis. All associated data files are merged into a single dataset with descriptive column names. The package also provides tools to assist in quality control by exploring the primary demographics of subsets of participants; query of disease diagnoses for one or more individuals, and estimating disease frequency relative to a reference variable; and to retrieve genetic metadata. Conclusion Having a dataset with meaningful variable names, a set of UKB-specific exploratory data analysis tools, disease query functions, and a set of helper functions to explore and write genetic metadata to file, will rapidly enable UKB users to undertake their research.
UR - http://www.scopus.com/inward/record.url?scp=85066428964&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0214311
DO - 10.1371/journal.pone.0214311
M3 - Article
C2 - 31150407
AN - SCOPUS:85066428964
SN - 1932-6203
VL - 14
JO - PLoS ONE
JF - PLoS ONE
IS - 5
M1 - e0214311
ER -