2 Scopus citations

Abstract

Anonymized electronic health records (EHR) are often used for biomedical research. One persistent concern with this type of research is the risk for re-identification of patients from their purportedly anonymized data. Here, we use the EHR of 731,850 de-identified patients to demonstrate that the average patient is unique from all others 98.4% of the time simply by examining what laboratory tests have been ordered for them. By the time a patient has visited the hospital on two separate days, they are unique in 72.3% of cases. We further present a computational study to identify how accurately the records from a single day of care can be used to re-identify patients from a set of 99 other patients. We show that, given a single visit's laboratory orders (even without result values) for a patient, we can re-identify the patient at least 25% of the time. Furthermore, we can place this patient among the top 10 most similar patients 47% of the time. Finally, we present a proof-of-concept technique using a variational autoencoder to encode laboratory results into a lower-dimensional latent space. We demonstrate that releasing latentspace encoded laboratory orders significantly improves privacy compared to releasing raw laboratory orders (<5% re-identification), while preserving information contained within the laboratory orders (AUC of >0.9 for recreating encoded values). Our findings have potential consequences for the public release of anonymized laboratory tests to the biomedical research community. We note that our findings do not imply that laboratory tests alone are personally identifiable. In the attack scenario presented here, reidentification would require a threat actor to possess an external source of laboratory values which are linked to personal identifiers at the start.

Original languageEnglish
Pages (from-to)415-426
Number of pages12
JournalPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
Volume24
Issue number2019
StatePublished - 2019
Event24th Pacific Symposium on Biocomputing, PSB 2019 - Kohala Coast, United States
Duration: 3 Jan 20197 Jan 2019

Keywords

  • Anonymization
  • Data privacy
  • Electronic health records
  • Patient re-identification
  • Variational autoencoder

Fingerprint

Dive into the research topics of 'Evaluation of patient re-identification using laboratory test orders and mitigation via latent space variables'. Together they form a unique fingerprint.

Cite this