Reducing patient re-identification risk for laboratory results within research datasets.

Atreya RV, Smith JC, McCoy AB, Malin B, Miller RA
J Am Med Inform Assoc. 2013 20 (1): 95-101

PMID: 22822040 · PMCID: PMC3555327 · DOI:10.1136/amiajnl-2012-001026

OBJECTIVE - To try to lower patient re-identification risks for biomedical research databases containing laboratory test results while also minimizing changes in clinical data interpretation.

MATERIALS AND METHODS - In our threat model, an attacker obtains 5-7 laboratory results from one patient and uses them as a search key to discover the corresponding record in a de-identified biomedical research database. To test our models, the existing Vanderbilt TIME database of 8.5 million Safe Harbor de-identified laboratory results from 61 280 patients was used. The uniqueness of unaltered laboratory results in the dataset was examined, and then two data perturbation models were applied-simple random offsets and an expert-derived clinical meaning-preserving model. A rank-based re-identification algorithm to mimic an attack was used. The re-identification risk and the retention of clinical meaning for each model's perturbed laboratory results were assessed.

RESULTS - Differences in re-identification rates between the algorithms were small despite substantial divergence in altered clinical meaning. The expert algorithm maintained the clinical meaning of laboratory results better (affecting up to 4% of test results) than simple perturbation (affecting up to 26%).

DISCUSSION AND CONCLUSION - With growing impetus for sharing clinical data for research, and in view of healthcare-related federal privacy regulation, methods to mitigate risks of re-identification are important. A practical, expert-derived perturbation algorithm that demonstrated potential utility was developed. Similar approaches might enable administrators to select data protection scheme parameters that meet their preferences in the trade-off between the protection of privacy and the retention of clinical meaning of shared data.

MeSH Terms (11)

Algorithms Biomedical Research Clinical Laboratory Information Systems Computer Security Confidentiality Electronic Health Records Feasibility Studies Humans Information Dissemination Medical Record Linkage United States

Connections (2)

This publication is referenced by other Labnodes entities: