Enabling genomic-phenomic association discovery without sacrificing anonymity.

Heatherly RD, Loukides G, Denny JC, Haines JL, Roden DM, Malin BA
PLoS One. 2013 8 (2): e53875

PMID: 23405076 · PMCID: PMC3566194 · DOI:10.1371/journal.pone.0053875

Health information technologies facilitate the collection of massive quantities of patient-level data. A growing body of research demonstrates that such information can support novel, large-scale biomedical investigations at a fraction of the cost of traditional prospective studies. While healthcare organizations are being encouraged to share these data in a de-identified form, there is hesitation over concerns that it will allow corresponding patients to be re-identified. Currently proposed technologies to anonymize clinical data may make unrealistic assumptions with respect to the capabilities of a recipient to ascertain a patients identity. We show that more pragmatic assumptions enable the design of anonymization algorithms that permit the dissemination of detailed clinical profiles with provable guarantees of protection. We demonstrate this strategy with a dataset of over one million medical records and show that 192 genotype-phenotype associations can be discovered with fidelity equivalent to non-anonymized clinical data.

MeSH Terms (10)

Algorithms Databases, Factual Genetic Association Studies Genetic Privacy Genome, Human Genome-Wide Association Study Genomics Genomics Humans Medical Records Systems, Computerized

Connections (4)

This publication is referenced by other Labnodes entities: