TY - JOUR
T1 - Prioritizing cancer hazard assessments for IARC Monographs using an integrated approach of database fusion and text mining
AU - Barupal, Dinesh Kumar
AU - Schubauer-Berigan, Mary K.
AU - Korenjak, Michael
AU - Zavadil, Jiri
AU - Guyton, Kathryn Z.
N1 - Publisher Copyright:
© 2021 The Author(s)
PY - 2021/11
Y1 - 2021/11
N2 - Background: Systematic evaluation of literature data on the cancer hazards of human exposures is an essential process underlying cancer prevention strategies. The scope and volume of evidence for suspected carcinogens can range from very few to thousands of publications, requiring a complex, systematically planned, and critical procedure to nominate, prioritize and evaluate carcinogenic agents. To aid in this process, database fusion, cheminformatics and text mining techniques can be combined into an integrated approach to inform agent prioritization, selection, and grouping. Results: We have applied these techniques to agents recommended for the IARC Monographs evaluations during 2020–2024. An integration of PubMed filters to cover cancer epidemiology, key characteristics of carcinogens, chemical lists from 34 databases relevant for cancer research, chemical structure grouping and a literature data-based clustering was applied in an innovative approach to 119 agents recommended by an advisory group for future IARC Monographs evaluations. The approach also facilitated a rational grouping of these agents and aids in understanding the volume and complexity of relevant information, as well as important gaps in coverage of the available studies on cancer etiology and carcinogenesis. Conclusion: A new data-science approach has been applied to diverse agents recommended for cancer hazard assessments, and its applications for the IARC Monographs are demonstrated. The prioritization approach has been made available at www.cancer.idsl.me site for ranking cancer agents.
AB - Background: Systematic evaluation of literature data on the cancer hazards of human exposures is an essential process underlying cancer prevention strategies. The scope and volume of evidence for suspected carcinogens can range from very few to thousands of publications, requiring a complex, systematically planned, and critical procedure to nominate, prioritize and evaluate carcinogenic agents. To aid in this process, database fusion, cheminformatics and text mining techniques can be combined into an integrated approach to inform agent prioritization, selection, and grouping. Results: We have applied these techniques to agents recommended for the IARC Monographs evaluations during 2020–2024. An integration of PubMed filters to cover cancer epidemiology, key characteristics of carcinogens, chemical lists from 34 databases relevant for cancer research, chemical structure grouping and a literature data-based clustering was applied in an innovative approach to 119 agents recommended by an advisory group for future IARC Monographs evaluations. The approach also facilitated a rational grouping of these agents and aids in understanding the volume and complexity of relevant information, as well as important gaps in coverage of the available studies on cancer etiology and carcinogenesis. Conclusion: A new data-science approach has been applied to diverse agents recommended for cancer hazard assessments, and its applications for the IARC Monographs are demonstrated. The prioritization approach has been made available at www.cancer.idsl.me site for ranking cancer agents.
KW - Chemoinformatics
KW - Database fusion
KW - Hazard identification
KW - IARC Monographs
KW - Text mining
UR - http://www.scopus.com/inward/record.url?scp=85105436953&partnerID=8YFLogxK
U2 - 10.1016/j.envint.2021.106624
DO - 10.1016/j.envint.2021.106624
M3 - Article
C2 - 33984576
AN - SCOPUS:85105436953
SN - 0160-4120
VL - 156
JO - Environment international
JF - Environment international
M1 - 106624
ER -