A Perspectives piece I wrote was published this week by iHealthBeat – Unlocking the Power of Health Data. In it I argue for patient-controlled sharing of rich data, as opposed to HIPAA-regulated stripping of identifiers in order to eliminate the risk to patient privacy as data is shared for research and other purposes. Googler Larry Page and Josh Stevens of Keas have argued recently in favor of broader uses of health data, but the issue of HIPAA keeps coming up in those conversations. Most connected patients seem comfortable with the idea of sharing health data, and as more of us get connected, this sentiment is only likely to spread. [Update 3/23/2016 – Since iHealthBeat is no more, I’m posting the full text here.]
In this day and age, with fewer and fewer exceptions, each of us has a digital footprint as a patient. Each of us has an intentional digital footprint, created by each of us directly and by the trusted others with whom we interact in the real world and online — through electronic health records, personal health records, personal tracker data, blogs, tweets, etc.
Each of us also has a trail of additional information that is a by-product of our online existence, our digital exhaust, which is out there to be mined for data. All of these data sources — individually or, more likely, when aggregated with that of others — may turn out to be usable as information, or perhaps even knowledge.
That is the promise of EHRs, and one key policy argument behind the federal government incentive program promoting their adoption: that health data writ large — big data — when properly analyzed will yield medical insights not otherwise accessible to us; that evidence-based medicine will be advanced immeasurably and that the dissemination of best practices will be tremendously accelerated.
One key bottleneck on the information highway to the future is created by the layers of privacy law that regulate the sharing of personal health data, or protected health information or PHI in HIPAA-speak.
Google Co-Founder Larry Page in a recent TED talk promoted the notion that health data should be shared for the common good. “Wouldn’t it be amazing to have anonymous medical records available to all research doctors?,” Page asked. He added that such sharing of health data would save hundreds of thousands of lives.
Yes, Larry, it would be amazing, but many folks out there are concerned that even de-identified (anonymized) data may be re-identified. There are numerous examples of this being done. On the one hand, there are a finite number of examples, perhaps suggesting that the re-identification problem may be fixed if we just put our heads together. On the other hand, the amount of information publicly available online likely more than doubles every day. So even if we did solve the problem today, information that is de-identified under the HIPAA safe harbor rule today could likely be re-identified tomorrow.
The safe harbor rule requires that 18 categories of identifiers be stripped out of a record in order for it to be considered de-identified. Number 18: Anything else that could be used to re-identify a de-identified record. In the world of big data, that’s not a very useful safe harbor.
The other approach to de-identification is statistical de-identification, using a methodology attested to by an expert. Sounds very scientific, but these methodologies may be “cracked” over time.
More significantly, records de-identified using either method become less useful to researchers. The fewer the data points in an individual patient record, the less it can tell us — and the less knowledge we are likely to gain about disease, injury, and their prevention and treatment.
Let me suggest a third path: patient donation of information, de-identified only so much as each individual patient desires.
Under HIPAA, each patient has the absolute right to have his or her complete electronic health record sent to the patient or to any third party at the direction of the patient. Third party repositories may be architected to permit views of records to researchers, to other patients, to clinicians, to whomever, and patients may instruct such repositories to share only as much as the individual patient desires. The restrictions may be on the populations of readers (researchers, other patients, etc.) or on the identifiers shared with the clinical data (name, age, gender, address, etc.). The data collected and cleaned in this fashion are far richer than a strictly de-identified data set.
Such a data repository may also link up the slightly de-identified — or not-at-all de-identified — patient data with that patient’s data exhaust, in order to present an even richer data set; health records cross-referenced with eating, travel, leisure and other habits could yield greater insights than the health data alone.
I have discussed the patient donation of data before, and the first objection I heard was from a data scientist who worried that the volume of patient records collected in this manner would be too small to yield any meaningful insights. While this may be true at first, I believe that over time patients will come to prefer to set their own limits on data sharing rather than be stuck with the one-size-fits-none approach available under HIPAA. In addition, the data made available through these repositories will be more valuable than that available as de-identified data for research precisely because there are more identifiers attached.
In a perfect world, freely sharing personally identifiable health information would not be problematic. In the real world, of course, it is. Revealing information is revealing vulnerability, and vulnerable populations experience discrimination — in health care, in employment, in housing and in other accommodations. That is why we tend to favor privacy regulation as the counterbalance to more mobile data. These are, in fact, the two faces of the HITECH Act: promoting the proliferation of EHRs while at the same time promulgating protections and stricter controls on the sharing of the information in those EHRs.
Ultimately, however, the protections prove to be inadequate, and the secondary use of the health data in these records is impeded by the privacy rules.
Recently published surveys on attitudes regarding privacy show that more than 95% of patients who are active social media users would agree to share their records without having them de-identified to HIPAA standards in order to help other patients — even though more than two-thirds of those same patients anticipate that they may suffer negative consequences as a result of the more open sharing of this clinical information. This represents a paradigm shift that is a product of our connected age, and it is a mindset that we should recognize — and use — for the greater good.
Information silos have been blamed for preventable harm in the past. It is clear that silos are still causing harm, and it is equally clear that we have tools available to us that will improve health at both the individual and population levels. Let’s use them.