The Need For A Smart Approach To Big Health Care Data


by David Newman, Carolina Herrera, Amanda Frost and Stephen Parente

Today, academic medicine and health policy research resemble the automobile industry of the early 20th century — a large number of small shops developing unique products at high cost with no one achieving significant economies of scale or scope. Academics, medical centers, and innovators often work independently or in small groups, with unconnected health datasets that provide incomplete pictures of the health statuses and health care practices of Americans.

Health care data needs a “Henry Ford” moment to move from a realm of unconnected and unwieldy data to a world of connected and matched data with a common support for licensing, legal, and computing infrastructure. Physicians, researchers, and policymakers should be able to access linked databases of medical records, claims, vital statistics, surveys, and other demographic data. To do this, the health care community must bring disparate health data together, maintaining the highest standards of security to protect confidential and sensitive data, and deal with the myriad legal issues associated with data acquisition, licensing, record matching, and the Health Insurance Portability and Accountability Act of 1996 (HIPAA).

Just as the Model-T revolutionized car production and, by extension, transit, the creation of smart health data enclaves will revolutionize care delivery, health policy, and health care research. We propose to facilitate these enclaves through a governance structure know as a digital rights manager (DRM). The concept of a DRM is common in the entertainment (The American Society of Composers, Authors and Publishers or ASCAP would be an example) and legal industries. If successful, DRMs would be a vital component of a data-enhanced health care industry.

Giving birth to change. The data enhanced health care industry is coming, but it needs a midwife.There has been explosive growth in the use of electronic medical records, electronic prescribing, and digital imaging by health care providers. Outside the physician’s office, disease registries, medical associations, insurers, government agencies, and laboratories have also been gathering digital pieces of information on the health status, care regimes, and health care costs of Americans. However, little to none of these data have been integrated, and most remain siloed within provider groups, health plans, or government offices.

In the past, technical and cost issues have restricted efforts to share and integrate health care datasets. However, advances in technology permit a bold vision of a new infrastructure involving shared access to big data, computing power, and analytic tools. The resources exist to access and analyze extremely large health data sets in the secure, HIPAA compliant, computing environments of data enclaves. Data enclaves are a “secure computing environment, firewalled from outside intrusion, accessible only by authorized users, that allows for remote access to microdata where the inflow and outflow are controlled and monitored by either experienced confidentiality officers or by algorithms, whereby users have access to analytic tools and only those data they are licensed to use.”

The governance issues. What remain to be resolved is how to legally and administratively bring the data together:

  1. how multiple stakeholders will provide data under standard contribution agreements;
  2. how to link extremely large and multi-year files, match records across datasets, and provide statistical deidentification where necessary; and
  3. how to license these data to multiple researchers under standard use agreements.

As stated above, we suggest that these tasks be solved by a digital rights manager.

How the DRM will work. Data owners, such as provider groups, are reasonably concerned about unfettered access to data. Therefore, the DRM’s most important job will be to provide a low-cost, reliable, and technically and legally protective environment in which data owners are comfortable placing their data. The DRM will negotiate data contribution agreements with each data owner, and the DRM will grant access to data users consistent with these agreements. Thus, a DRM will reduce the burden on potential data contributors giving them greater incentive to participate and share data by allowing them to deal with a single responsible party.

The DRM will also have a responsibility for fulfilling all the legal requirements that must be met—under HIPAA, state law, or otherwise—relative to the uses of the data. The DRM will also negotiate software-licensing agreements and arrange for commonly required intermediate value added services such as encrypted provider or individual identifiers or statistical de-identification. To do so, the DRM will require specialized expertise in the HIPAA, statistical de-identification, and an enhanced institutional review board with an understanding of big data risks and opportunities.

Under this governance structure, health data owners who want to generate useful insights from their health data can do so securely. Their data, when shared, will be secure, their confidential information will remain protected, and they will not be burdened with administrative expenses associated with distribution, licensing, or oversight associated with their data. In essence, all of these tasks can be efficiently contracted out to a common technology platform entity so as to reduce the burden on data owners, thereby making more likely that they will share their data. Thus, together, the DRM and the data enclave can transform health data into smart data (Figure 1).


Great benefits and manageable risks. The potential benefits of smart health data are great, but data must be actionable. To that end, the previously outlined governance structure removes barriers and creates new opportunities. For patients, the enclave will be an opportunity to receive better care from evidence-based practice and personalized medicine. For physicians, more complete and accurate patient information will enable the delivery of better care. For health policy researchers and policymakers, linked data will allow for a better understanding of trends and the impacts of policy initiatives. As a result, the enclave offers an efficient setting in which to engage in comparative and cost effectiveness research.

Some may question the wisdom of hosting so much data. We believe that smart data enclaves will mitigate the risks to patients and providers. As a country, we are missing an opportunity to maximize the gains from the already expended effort to create EHRs and from nearly two decades of HIPAA compliant health data use. Entire generations of medical professionals and researchers are unfamiliar with administrative claims and registry data due to the absence of cost-reducing shared infrastructure. The question should not be whether we should have a smart health data world, but how soon can we make it happen.

