menu

Submission

submission voting
voting is closed.
introduction
title
"Home is where the health is": Geo-Sharing
short description
Sensitive patient location data become sharable results by using our open-source privacy-preserving tools mindful of HIPAA concerns.
Submission Details
Please complete these prompts for your round one submission.
Submission Category
Data sharing
Abstract / Overview

Patient location data is highly sensitive due to privacy concerns yet highly valuable for research due to neighborhood-level social determinants of heath (SDOH), such as poverty levels, and important proximity calculations, such as distance to healthcare resources. Patient addresses are typically unshared due to compliance with privacy protections mandated by the Health Insurance Portability and Accountability Act of 1996 (HIPAA).  The addresses themselves are non-informative to biomedical research, but rather, they act as a link to important contextual information about a patient’s living environment, including SDOH. We have a new framework for processing addresses using "in-house" tools and privacy-preserving methods for data sharing.

Team

Our team consists of researchers from the Institute for Pharmaceutical Outcomes and Policy (IPOP) at the University of Kentucky (UK). Our team collaborates on open-source geospatial initiatives and believes open data and transparency is the future of research. Dr. Harris is an early-stage investigator and Director of Clinical Research Analytics for IPOP. Dr. Harris was born in rural KY and is interested in researching the impact of location on health and equitable access to resources. Mr. Anthony is a data manager and software developer that works remotely for IPOP from Florida. Dr. Delcher is an epidemiologist and Director of IPOP; he oversees the institute’s research direction. IPOP assists in managing a HIPAA-compliant data center with data from our university’s health system. Sharing HIPAA-protected data sets with external partners requires legal agreements, while the other less-sensitive, derived data sets are contributed to open repositories. We believe sensitive HIPPA-protected data may be converted into less-sensitive de-identified data in a manner that maintains research utility. Our healthcare system had no geospatial capabilities due to privacy concerns; our results are now added to our warehouse for everyone’s use. 

Potential Impact

"home is where the health is" underscores the importance of location and environment in one’s health; in fact, evidence exists that one’s zip code may be a better predictor of health than one’s genetic code (https://tinyurl.com/ytktae92). Despite playing a major role, patient location is difficult to obtain for research purposes and near impossible to share due to HIPAA privacy protections. Our project’s goal is to provide tools that balance privacy and utility. We developed these tools between 08/2019 and 05/2022; they act as infrastructure locally to support geospatial science and to contribute data to our local healthcare warehouse and the public. Our first tool is bench4gis (see Supporting Information 1) which tests and benchmarks performance of geocoders. For privacy, geocoding usually must happen “in-house” in a secure domain. bench4gis measures performance of geocoders by using open big data as a referential source of truth; the distance from the reference and computed points are aggregated and summarized. dp-OMOP (see Supporting Information 2) has database functions that connect differential privacy (DP) algorithms and the OMOP common data model. DP attempts to minimize the analytic impact of adding or removing a single record which minimizes risk to patients because it obscures whether their record is included in the data set. dp-OMOP includes benchmarking functions that report the trade off between privacy and accuracy. Our last tool is geoPIPE (Geospatial Pipeline for Enhancing Open Data for Substance Use Disorders Research), which derives important location-based data from open data sets (See Supporting Information 3 – to be presented at AMIA’s Annual Symposium). We expanded overdose death records from Cook County, IL with geoPIPE to geocode records, calculate distances to nearby points of interest (e.g., pharmacies), and calculate contextual information such as land use and park classifications. Most importantly, we returned our expansion back to our open repository. Our team has navigated toward open data as a sharing practice gradually over time and open sourcing our software as common practice. We encourage researchers to adopt open data practices. Our collaborators at the Cook County Medical Examiner’s Office were open data pioneers. In early 2022, we began discussions with other offices to encourage replication; having Cook County as a leader helps convince people that safe data sharing is achievable.

Replicability

Our tools depend purely upon freely available open-source software available to anyone. PostgreSQL is a freely available database management system (DBMS) and is popular in open-source projects; it is currently the third most popular DBMS in the world (https://www.datanyze.com/market-share/databases--272). PostGIS is an open-source extension for PostgreSQL that provides many foundational geospatial features. Our in-house developed software operates on top of PostgresSQL and PostGIS and relies upon SQL and Python which are both popular in the informatics and data science communities.  Our bench4gis and geoPIPE tools use publicly available open data sets and our code is published and available online (see Supporting Information 1 and 2).  Our plugin for differential privacy in healthcare depends upon the OMOP common data model popularized by biomedical research collaborations.  Because our tools are all open-source, replication is as easy as downloading each tool and running it.  We are also working on packaging all our software components together under one downable virtual container (Docker), which will expedite adoption and lower replication barriers. 

Potential for Community Engagement and Outreach

Data sharing unlocks data from silos with limited reach and potentially pushes it into research arenas capable of using it for the greater good. Geospatial analyses of patient addresses reveal pivotal contextual SDOH and opens the door for testing regional differences observable in studies. We believe the future of research depends upon open data adhering to FAIR principles. We firmly believe that open-source software is the key to reproducible and equitable science, where tools are publicly available and do not act as a barrier in adoption of research ideas. Commercial products naturally serve their role in the technology and science domains, but cost may prevent equal access to research. Our software is layered on top of existing open-source software and in return, we also release our work as open-source so others may benefit or join us in collaboration. Our software facilitates data sharing by converting sensitive information into sharable information. Our geoPIPE tool, as discussed earlier, was used to process death records from Cook County, IL; we contributed our software, our geospatial data, and our results back to our open-source repository, which has enabled research papers, grant proposals, and new collaborations. 

Supporting Information (Optional)
Include links to relevant and publicly accessible website page(s), up to three relevant publications, and/or up to five relevant resources.
Supporting Documentation 01
https://pubmed.ncbi.nlm.nih.gov/32185372/
Supporting Documentation 02
https://pubmed.ncbi.nlm.nih.gov/35253022/

comments (public)