Big data is an overused term and is interpreted differently by many. Yet a common definition that has recently gained widespread acceptance is “Big data processes large data sets which cannot be handled by traditional databases or relational databases.” While some of the problems of how to extract value out of data remain the same between traditional and big data methods, the mechanics of how large data is processed, transformed and stored are different. Big data is gaining more widespread adoption and enabling population health scenarios. You may be interested in a blog I wrote on big data and healthcare last year which captures more details around big data scenarios in pop health: The Future of Big Data in Healthcare.
Big data processes large data sets which cannot be handled by traditional databases or relational databases
In this post, we interview Neal Singh from Caradigm to learn more about this topic. Neal has spent more than 22 years working in enterprise business applications and previously served as the general manager, global development, for Microsoft Dynamics AX. In this role, he was responsible for leading a globally distributed enterprise resource planning (ERP) engineering organization for Microsoft Dynamics AX. Neal joined Microsoft in 2001 as director of Development and has held many R&D leadership positions, including product unit manager, director of Engineering Services and general manager. Neal has a degree in electrical engineering from India and a master of business administration from Phillips University.
Do you think Big Data could have an impact on population health? How will that be?
Population Health requires a better understanding of your organization’s patient population, utilization, costs, quality, chronic conditions etc., across multiple systems. These systems extend beyond the EMR and outside your enterprise. Given the multiple systems involved, it is important to normalize and aggregate clinical, operational and financial data from systems across the community—including EMRs, billing systems, payers, pharmacy systems, labs, and HIEs—and deliver in timely manner, ideally within point of care workflows. Most of the data sets in the industry tend to be structured. We are now beginning to see early use unstructured data like provider notes. In addition, we are seeing more use of consumer, genomic, demographic and social data (e.g. fitness devices, purchasing history, Twitter, Facebook information) integrated into scenarios for population health. Big Data can have a significant impact on managing populations at an aggregate as well an individual level. Here are some scenarios:
- A true longitudinal patient record – As a provider you can see patient data across all systems that cover the full continuum of care
- Predictive models and machine learning: These concepts are excellent for building statistical models (e.g. Risk Stratification) and are especially effective with large data sets
- Understand patient motivational factors – e.g. with clinical data you know what the patient was prescribed, but with the addition of claims data you can see if the patient actually filled the prescription
- Natural Language Processing (NLP): NLP models can be used for analytics of unstructured data (observations, family history etc. from provider notes) and surface them at the point of care
- Consumer purchasing data: If you were granted access to consumer purchasing data, you could see a diabetes patient visiting McDonalds 3 times a week and have appropriate conversations on diet management.
- Social data: You could perform disease outbreak monitoring and sentiment mining via personal and news tweets
Does Big Data actually show positive results in real life, or is it still more academic and research oriented?
In population health, big data is transitioning from a more research oriented mode to a main stream agenda item in healthcare. At the same time, population health is still in the early adoption phases of Big Data. Only a very few organizations are dealing with petabytes of data, which typically falls into the realm of Big Data.
Are there startups or large companies adopting Big Data in their work? When do you think this started?
Adoption of Big Data is picking up but still nascent. Adoption started about a couple of years ago.
What main issues do you think Big Data can solve in population health that other technologies might find it hard to solve?
Big Data will solve the problem of scale and lowering cost for processing large data sets. It will allow for data to be processed faster with closer to real time availability. Also given the speed of processing, Big Data will enable more scenarios such as correlating data sets, indexing data, and natural language processing.
How do you see the future of Big Data in the Digital Health industry?
It will become more main stream during the next 5-10 years. We will also start seeing more adoption by both vendors and enterprises. In addition, we will start seeing richer analytics as correlations and computations can run across larger and more diverse sets. The diversity of data will enhance to include genomic data sets, device generated data, and social data. We will also see greater adoption of publicly available data sets and anonymized datasets for benchmarking.
Big data consumption will also come with challenges. The obvious ones will include security, privacy and data governance. In addition EMRs are already considered burdensome by providers and take time away from delivering care. So Big Data does not equate to delivering more information to providers. Rather the goal is to perform all the analytics offline and deliver the minimal actionable data to meaningfully impact patient outcomes.
What recommendations would you like to give those interested in investing in Big Data for population health, or starting a startup in the field?
- Start by identifying a few key scenario goals for your organization and then determine what data you want to acquire. Otherwise it is too easy to get overwhelmed with data acquisition and get lost in a sea of information.
- Big Data talent is hard to find. So start small with key scenarios that will set you up for success.
- Use existing technologies. There are excellent open source technologies (Hadoop, OpenNLP, R) that have matured over the last few years.
- Use existing infrastructure. For example both Azure and AWS are good big data and cloud vendors. Building your own infrastructure is a losing proposition.
- Keep security, privacy and data governance in your sights.