Big Data is certainly the technology buzzword of the day, and while it is certainly true that the modern healthcare facility generates huge volumes of data, it can be argued whether or not that aggregation of data should be treated as “Big Data”. Although mandated electronic health records do generate massive amounts of data, there is significant debate over whether or not the majority of that data is utilized to its fullest potential. In fact, industry experts suggest that neither the volume nor the velocity of the data in a typical healthcare system today is even high enough to make effective use of modern big data tools (via Intermountain Healthcare).
Recent data integration analysis with the nation’s Military Health System has shown that only a small
fraction of the tables in their newly procured EHR system (perhaps only a few hundred tables out of many thousands) would be even remotely relevant to their current analytical use cases. Whether or not that data would be relevant at some point in the future is debatable. It can be assumed that as new use cases emerge, some of the currently unused data would eventually become relevant. One could imagine that future genomic use cases would clearly require a big data capability. However, the point is that the vast majority of the collected data isn’t relative to their current clinical analytic use cases. It can can be generally inferred that most clinical business intelligence use cases do not require a real “big data” capability or toolset.
The case can be made that the overwhelming majority of clinical analytics and reporting needs in a typical health system can be accomplished without a true big data capability. This isn’t what most early adopters want to hear, but the cold hard truth is that we haven’t even come close to the limits of what clinical intelligence can accomplish with traditional relational databases. The fact is that there are some pretty significant barriers to the widespread adaption of big data capabilities in healthcare today. These barriers make the extension of traditional databases far more effective and valuable. Extending legacy RDBMS capabilities to accommodate unstructured data should always remain a plausible alternative to implementing a full scale big data operation, and these alternatives should be evaluated as viable courses of action.
Perhaps the most significant barrier to the adaptation of big data in healthcare is expertise; the data science human capital required to implement the capability correctly. This is where the real value is extracted from big data style deployments. Nurse informaticists and other hospital staff who typically perform BI functions are more familiar with traditional relational databases and SQL programming. They often do not have the technical expertise required to extract similar value from a big data toolset. Generally speaking, there is a fairly steep learning curve to working with big data.
In most cases, organizations will need Ph.D. level resources with a very narrowly focused expertise. These folks are not usually found roaming the halls of a typical health system. True data scientists are in high demand across a number of industry sectors – finance, security & intelligence, and SEO companies to name but a few. Finding a highly skilled data scientist with specific domain expertise in healthcare data is rare. They are usually only accessibly to research institutions or to very large health systems with deep pockets. This appears to be changing somewhat as industry leading big data toolsets mature. Many of the newer platforms have started to embrace some form of SQL extension as the preferred method of querying. This will inevitably reduce the technical barrier to entry for the healthcare sector over time.
The second most significant barrier to big data adaptation in healthcare is in all likelihood security.
HIPAA compliance is mandatory, and the penalties for security breaches in the health sector are steep. For this reason, securing patient data remains a top priority. Unfortunately, most health systems are just not that well equipped to manage that priority efficiently or effectively; and there are no real commercial offerings that can be bolted on to a big data system to manage data security in an integrated manner. It is most often an afterthought; something a third party application would take care of. Until this is truly addressed in a comprehensive manner, the security of patient data cannot be assumed. Most big data tools run on (or have been derived from) open source technologies; therefore data security implementations are very inconsistent.
This in no way is meant to deter big data adoption, only to highlight that much of the same clinical intelligence can be harvested using existing (and far less expensive) traditional data management technologies. Big data is coming to healthcare. It will eventually provide real value and ultimately help to improve clinical outcome. However, while the big data platform vendors work on their barrier to entry issues, there is no shame in exploiting traditional structured databases; or even in using big data techniques to link unstructured data in a Hadoop cluster to structured patient data in your traditional MS SQL server.