Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

Big data analytics in healthcare: promise and potential

Big data analytics in healthcare: promise and potential Objective: To describe the promise and potential of big data analytics in healthcare. Methods: The paper describes the nascent field of big data analytics in healthcare, discusses the benefits, outlines an architectural framework and methodology, describes examples reported in the literature, briefly discusses the challenges, and offers conclusions. Results: The paper provides a broad overview of big data analytics for healthcare researchers and practitioners. Conclusions: Big data analytics in healthcare is evolving into a promising field for providing insight from very large data sets and improving outcomes while reducing costs. Its potential is great; however there remain challenges to overcome. Keywords: Big data, Analytics, Hadoop, Healthcare, Framework, Methodology Introduction (or impossible) to manage with traditional software and/ The healthcare industry historically has generated large or hardware; nor can they be easily managed with trad- amounts of data, driven by record keeping, compliance itional or common data management tools and methods & regulatory requirements, and patient care [1]. While [7]. Big data in healthcare is overwhelming not only be- most data is stored in hard copy form, the current trend is cause of its volume but also because of the diversity of toward rapid digitization of these large amounts of data. data types and the speed at which it must be managed [7]. Driven by mandatory requirements and the potential to The totality of data related to patient healthcare and well- improve the quality of healthcare delivery meanwhile re- being make up “big data” in the healthcare industry. It ducing the costs, these massive quantities of data (known includes clinical data from CPOE and clinical decision as ‘big data’) hold the promise of supporting a wide range support systems (physician’s written notes and prescrip- of medical and healthcare functions, including among tions, medical imaging, laboratory, pharmacy, insurance, others clinical decision support, disease surveillance, and other administrative data); patient data in electronic and population health management [2-5]. Reports say patient records (EPRs); machine generated/sensor data, data from the U.S. healthcare system alone reached, in such as from monitoring vital signs; social media posts, in- 2011, 150 exabytes. At this rate of growth, big data for U.S. cluding Twitter feeds (so-called tweets) [8], blogs [9], status healthcare will soon reach the zettabyte (10 gigabytes) updates on Facebook and other platforms, and web pages; scale and, not long after, the yottabyte (10 gigabytes) [6]. and less patient-specific information, including emergency Kaiser Permanente, the California-based health network, care data, news feeds, and articles in medical journals. which has more than 9 million members, is believed to For the big data scientist, there is, amongst this vast have between 26.5 and 44 petabytes of potentially rich amount and array of data, opportunity. By discovering data from EHRs, including images and annotations [6]. associations and understanding patterns and trends By definition, big data in healthcare refers to electronic within the data, big data analytics has the potential to health data sets so large and complex that they are difficult improve care, save lives and lower costs. Thus, big data analytics applications in healthcare take advantage of the * Correspondence: raghupathi@fordham.edu explosion in data to extract insights for making better Graduate School of Business, Fordham University, 113 W. 60th Street, 10023 informed decisions [10-12], and as a research category New York, NY, USA Full list of author information is available at the end of the article © 2014 Raghupathi and Raghupathi; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 2 of 10 http://www.hissjournal.com/content/2/1/3 are referred to as, no surprise here, big data analytics in for healthcare organizations to acquire the available healthcare [13-15]. When big data is synthesized and an- tools, infrastructure, and techniques to leverage big data alyzed—and those aforementioned associations, patterns effectively or else risk losing potentially millions of dol- and trends revealed—healthcare providers and other lars in revenue and profits [19]. stakeholders in the healthcare delivery system can de- What exactly is big data? A report delivered to the U.S. velop more thorough and insightful diagnoses and treat- Congress in August 2012 defines big data as “large vol- ments, resulting, one would expect, in higher quality umes of high velocity, complex, and variable data that re- care at lower costs and in better outcomes overall [12]. quire advanced techniques and technologies to enable the The potential for big data analytics in healthcare to lead capture, storage, distribution, management and analysis of to better outcomes exists across many scenarios, for ex- the information” [6]. Big data encompasses such charac- ample: by analyzing patient characteristics and the cost teristics as variety, velocity and, with respect specifically to and outcomes of care to identify the most clinically and healthcare, veracity [20-23]. Existing analytical techniques cost effective treatments and offer analysis and tools, can be applied to the vast amount of existing (but cur- thereby influencing provider behavior; applying ad- rently unanalyzed) patient-related health and medical data vanced analytics to patient profiles (e.g., segmentation to reach a deeper understanding of outcomes, which then and predictive modeling) to proactively identify individ- can be applied at the point of care. Ideally, individual and uals who would benefit from preventative care or life- population data would inform each physician and her style changes; broad scale disease profiling to identify patient during the decision-making process and help de- predictive events and support prevention initiatives; col- terminethe most appropriatetreatment option for that lecting and publishing data on medical procedures, thus particular patient. assisting patients in determining the care protocols or regimens that offer the best value; identifying, predicting Advantages to healthcare and minimizing fraud by implementing advanced ana- By digitizing, combining and effectively using big data, lytic systems for fraud detection and checking the accur- healthcare organizations ranging from single-physician acy and consistency of claims; and, implementing much offices and multi-provider groups to large hospital net- nearer to real-time, claim authorization; creating new works and accountable care organizations stand to revenue streams by aggregating and synthesizing patient realize significant benefits [2]. Potential benefits include clinical records and claims data sets to provide data and detecting diseases at earlier stages when they can be services to third parties, for example, licensing data to treated more easily and effectively; managing specific in- assist pharmaceutical companies in identifying patients dividual and population health and detecting health care for inclusion in clinical trials. Many payers are develop- fraud more quickly and efficiently. Numerous questions ing and deploying mobile apps that help patients manage can be addressed with big data analytics. Certain devel- their care, locate providers and improve their health. Via opments or outcomes may be predicted and/or esti- analytics, payers are able to monitor adherence to drug mated based on vast amounts of historical data, such as and treatment regimens and detect trends that lead to length of stay (LOS); patients who will choose elective individual and population wellness benefits [12,16-18]. surgery; patients who likely will not benefit from surgery; This article provides an overview of big data analytics complications; patients at risk for medical complications; in healthcare as it is emerging as a discipline. First, we patients at risk for sepsis, MRSA, C. difficile, or other define and discuss the various advantages and character- hospital-acquired illness; illness/disease progression; pa- istics of big data analytics in healthcare. Then we de- tients at risk for advancement in disease states; causal scribe the architectural framework of big data analytics factors of illness/disease progression; and possible co- in healthcare. Third, the big data analytics application morbid conditions (EMC Consulting). McKinsey esti- development methodology is described. Fourth, we pro- mates that big data analytics can enable more than $300 vide examples of big data analytics in healthcare reported billion in savings per year in U.S. healthcare, two thirds in the literature. Fifth, the challenges are identified. Lastly, of that through reductions of approximately 8% in na- we offer conclusions and future directions. tional healthcare expenditures. Clinical operations and R & D are two of the largest areas for potential savings Big data analytics in healthcare with $165 billion and $108 billion in waste respectively Health data volume is expected to grow dramatically in [24]. McKinsey believes big data could help reduce waste the years ahead [6]. In addition, healthcare reimburse- and inefficiency in the following three areas: ment models are changing; meaningful use and pay for performance are emerging as critical new factors in to- Clinical operations: Comparative effectiveness day’s healthcare environment. Although profit is not and research to determine more clinically relevant and should not be a primary motivator, it is vitally important cost-effective ways to diagnose and treat patients. Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 3 of 10 http://www.hissjournal.com/content/2/1/3 Research & development: 1) predictive modeling to and processes that do not deliver demonstrable benefits or cost too much; reducing readmissions by identifying lower attrition and produce a leaner, faster, more targeted R & D pipeline in drugs and devices; environmental or lifestyle factors that increase risk or trig- 2) statistical tools and algorithms to improve clinical ger adverse events [26] and adjusting treatment plans ac- cordingly; improving outcomes by examining vitals from trial design and patient recruitment to better match treatments to individual patients, thus reducing trial at-home health monitors; managing population health by failures and speeding new treatments to market; and detecting vulnerabilities within patient populations during disease outbreaks or disasters; and bringing clinical, finan- 3) analyzing clinical trials and patient records to identify follow-on indications and discover adverse effects before cial and operational data together to analyze resource products reach the market. utilization productively and in real time [16]. Public health: 1) analyzing disease patterns and tracking disease outbreaks and transmission to improve public health surveillance and speed response; 2) faster The 4 “Vs” of big data analytics in healthcare development of more accurately targeted vaccines, e.g., Like big data in healthcare, the analytics associated with choosing the annual influenza strains; and, 3) turning big data is described by three primary characteristics: large amounts of data into actionable information that volume, velocity and variety (http://www-01.ibm.com/soft can be used to identify needs, provide services, and ware/data/bigdata/). Over time, health-related data will be predict and prevent crises, especially for the benefit of created and accumulated continuously, resulting in an in- populations [24]. credible volume of data. The already daunting volume of In addition, [14] suggests big data analytics in existing healthcare data includes personal medical records, healthcare can contribute to radiology images, clinical trial data FDA submissions, hu- Evidence-based medicine: Combine and analyze a man genetics and population data genomic sequences, etc. variety of structured and unstructured data-EMRs, Newer forms of big data, such as 3D imaging, genomics financial and operational data, clinical data, and genomic and biometric sensor readings, are also fueling this ex- data to match treatments with outcomes, predict patients ponential growth. at risk for disease or readmission and provide more Fortunately, advances in data management, particu- efficient care; larly virtualization and cloud computing, are facilitating Genomic analytics: Execute gene sequencing more the development of platforms for more effective capture, efficiently and cost effectively and make genomic storage and manipulation of large volumes of data [4]. analysis a part of the regular medical care decision Data is accumulated in real-time and at a rapid pace, or process and the growing patient medical record [25]; velocity. The constant flow of new data accumulating at Pre-adjudication fraud analysis: Rapidly analyze unprecedented rates presents new challenges. Just as the large numbers of claim requests to reduce fraud, waste volume and variety of data that is collected and stored and abuse; has changed, so too has the velocity at which it is gener- Device/remote monitoring: Capture and analyze in ated and that is necessary for retrieving, analyzing, com- real-time large volumes of fast-moving data from paring and making decisions based on the output. in-hospital and in-home devices, for safety monitoring Most healthcare data has been traditionally static—paper and adverse event prediction; files, x-ray films, and scripts. Velocity of mounting data in- Patient profile analytics: Apply advanced analytics creases with data that represents regular monitoring, such to patient profiles (e.g., segmentation and predictive as multiple daily diabetic glucose measurements (or more modeling) to identify individuals who would benefit continuous control by insulin pumps), blood pressure from proactive care or lifestyle changes, for example, readings, and EKGs. Meanwhile, in many medical situa- those patients at risk of developing a specific disease tions, constant real-time data (trauma monitoring for (e.g., diabetes) who would benefit from preventive blood pressure, operating room monitors for anesthesia, care [14]. bedside heart monitors, etc.) can mean the difference be- tween life and death. According to [16], areas in which enhanced data and Future applications of real-time data, such as detecting analytics yield the greatest results include: pinpointing infections as early as possible, identifying them swiftly patients who are the greatest consumers of health re- and applying the right treatments (not just broad-spectrum sources or at the greatest risk for adverse outcomes; pro- antibiotics) could reduce patient morbidity and mortality viding individuals with the information they need to and even prevent hospital outbreaks. Already, real-time make informed decisions and more effectively manage streaming data monitors neonates in the ICU, catching their own health as well as more easily adopt and track life-threatening infections sooner [6]. The ability to per- healthier behaviors; identifying treatments, programs form real-time analytics against such high-volume data in Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 4 of 10 http://www.hissjournal.com/content/2/1/3 motion and across all specialties would revolutionize past and, more importantly, expedite distribution to the healthcare [4]. Therein lies variety. right patients [4]. The prospects for all areas of health- As the nature of health data has evolved, so too have care are infinite. analytics techniques scaled up to the complex and so- Some practitioners and researchers have introduced a fourth characteristic, veracity, or ‘data assurance’. That phisticated analytics necessary to accommodate volume, velocity and variety. Gone are the days of data collected is, the big data, analytics and outcomes are error-free exclusively in electronic health records and other struc- and credible. Of course, veracity is the goal, not (yet) the reality. Data quality issues are of acute concern in tured formats. Increasingly, the data is in multimedia format and unstructured. The enormous variety of data— healthcare for two reasons: life or death decisions de- structured, unstructured and semi-structured—is a di- pend on having the accurate information, and the quality of healthcare data, especially unstructured data, is highly mension that makes healthcare data both interesting and challenging. variable and all too often incorrect. (Inaccurate “transla- Structured data is data that can be easily stored, quer- tions” of poor handwriting on prescriptions are perhaps the most infamous example). ied, recalled, analyzed and manipulated by machine. His- torically, in healthcare, structured and semi-structured Veracity assumes the simultaneous scaling up in granu- data includes instrument readings and data generated by larity and performance of the architectures and plat- forms, algorithms, methodologies and tools to match the ongoing conversion of paper records to electronic health and medical records. Historically, the point of the demands of big data. The analytics architectures care generated unstructured data: office medical records, and tools for structured and unstructured big data are very different from traditional business intelligence (BI) handwritten nurse and doctor notes, hospital admission and discharge records, paper prescriptions, radiograph tools. They are necessarily of industrial strength. For ex- films, MRI, CT and other images. ample, big data analytics in healthcare would be exe- cuted in distributed processing across several servers Already, new data streams—structured and unstruc- tured—are cascading into the healthcare realm from fit- (“nodes”), utilizing the paradigm of parallel computing ness devices, genetics and genomics, social media and ‘divide and process’ approach. Likewise, models and research and other sources. But relatively little of this techniques—such as data mining and statistical approaches, data can presently be captured, stored and organized so algorithms, visualization techniques—need to take into ac- that it can be manipulated by computers and analyzed count the characteristics of big data analytics. Traditional for useful information. Healthcare applications in par- data management assumes that the warehoused data is ticular need more efficient ways to combine and convert certain, clean, and precise. varieties of data including automating conversion from Veracity in healthcare data faces many of the same is- structured to unstructured data. sues as in financial data, especially on the payer side: Is The structured data in EMRs and EHRs include famil- this the correct patient/hospital/payer/reimbursement iar input record fields such as patient name, data of code/dollar amount? Other veracity issues are unique to birth, address, physician’s name, hospital name and ad- healthcare: Are diagnoses/treatments/prescriptions/proce- dress, treatment reimbursement codes, and other infor- dures/outcomes captured correctly? mation easily coded into and handled by automated Improving coordination of care, avoiding errors and databases. The need to field-code data at the point of reducing costs depend on high-quality data, as do ad- care for electronic handling is a major barrier to accept- vances in drug safety and efficacy, diagnostic accuracy ance of EMRs by physicians and nurses, who lose the and more precise targeting of disease processes by treat- natural language ease of entry and understanding that ments. But increased variety and high velocity hinder handwritten notes provide. On the other hand, most the ability to cleanse data before analyzing it and making providers agree that an easy way to reduce prescription decisions, magnifying the issue of data “trust” [4]. errors is to use digital entries rather than handwritten The ‘4Vs’ are an appropriate starting point for a scripts. discussion about big data analytics in healthcare. But The potential of big data in healthcare lies in combin- there are other issues to consider, such as the num- ing traditional data with new forms of data, both indi- ber of architectures and platforms, and the domin- vidually and on a population level. We are already seeing ance of the open source paradigm in the availability data sets from a multitude of sources support faster and of tools. Consider, too, the challenge of developing more reliable research and discovery. If, for example, methodologies and the need for user-friendly inter- pharmaceutical developers could integrate population faces. While the overall cost of hardware and software clinical data sets with genomics data, this development is declining, these issues have to be addressed to har- could facilitate those developers gaining approvals on ness and maximize the potential of big data analytics more and better drug therapies more quickly than in the in healthcare. Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 5 of 10 http://www.hissjournal.com/content/2/1/3 Architectural framework tables, ASCII/text, etc.) and residing at multiple locations The conceptual framework for a big data analytics pro- (geographic as well as in different healthcare providers’ ject in healthcare is similar to that of a traditional health sites) in numerous legacy and other applications (transac- informatics or analytics project. The key difference lies tion processing applications, databases, etc.). Sources and in how processing is executed. In a regular health analyt- data types include: ics project, the analysis can be performed with a busi- ness intelligence tool installed on a stand-alone system, 1. Web and social media data: Clickstream and such as a desktop or laptop. Because big data is by defin- interaction data from Facebook, Twitter, LinkedIn, ition large, processing is broken down and executed blogs, and the like. It can also include health plan across multiple nodes. The concept of distributed pro- websites, smartphone apps, etc. [6]. cessing has existed for decades. What is relatively new is 2. Machine to machine data: readings from remote its use in analyzing very large data sets as healthcare sensors, meters, and other vital sign devices [6]. providers start to tap into their large data repositories to 3. Big transaction data: health care claims and other gain insight for making better-informed health-related billing records increasingly available in semi-structured decisions. Furthermore, open source platforms such as and unstructured formats [6]. Hadoop/MapReduce, available on the cloud, have encour- 4. Biometric data: finger prints, genetics, handwriting, aged the application of big data analytics in healthcare. retinal scans, x-ray and other medical images, blood While the algorithms and models are similar, the user pressure, pulse and pulse-oximetry readings, and interfaces of traditional analytics tools and those used other similar types of data [6]. for big data are entirely different; traditional health ana- 5. Human-generated data: unstructured and lytics tools have become very user friendly and transpar- semi-structured data such as EMRs, physicians ent. Big data analytics tools, on the other hand, are notes, email, and paper documents [6]. extremely complex, programming intensive, and require the application of a variety of skills. They have emerged For the purpose of big data analytics, this data has to in an ad hoc fashion mostly as open-source development be pooled. In the second component the data is in a tools and platforms, and therefore they lack the support ‘raw’ state and needs to be processed or transformed, at and user-friendliness that vendor-driven proprietary which point several options are available. A service- tools possess. As Figure 1 indicates, the complexity be- oriented architectural approach combined with web ser- gins with the data itself. vices (middleware) is one possibility [27]. The data stays Big data in healthcare can come from internal (e.g., elec- raw and services are used to call, retrieve and process tronic health records, clinical decision support systems, the data. Another approach is data warehousing wherein CPOE, etc.) and external sources (government sources, la- data from various sources is aggregated and made ready boratories, pharmacies, insurance companies & HMOs, for processing, although the data is not available in real- etc.), often in multiple formats (flat files, .csv, relational time. Via the steps of extract, transform, and load (ETL), Figure 1 An applied conceptual architecture of big data analytics. Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 6 of 10 http://www.hissjournal.com/content/2/1/3 Table 1 Platforms & tools for big data analytics in data from diverse sources is cleansed and readied. De- healthcare pending on whether the data is structured or unstruc- Platform/Tool Description tured, several data formats can be input to the big data analytics platform. The Hadoop Distributed HDFS enables the underlying storage for File System (HDFS) the Hadoop cluster. It divides the data into In this next component in the conceptual framework, smaller parts and distributes it across the several decisions are made regarding the data input ap- various servers/nodes. proach, distributed design, tool selection and analytics MapReduce MapReduce provides the interface for the models. Finally, on the far right, the four typical applica- distribution of sub-tasks and the gathering tions of big data analytics in healthcare are shown. of outputs. When tasks are executed, MapReduce tracks the processing of each These include queries, reports, OLAP, and data mining. server/node. Visualization is an overarching theme across the four ap- PIG and PIG Latin Pig programming language is configured plications. Drawing from such fields as statistics, com- (Pig and PigLatin) to assimilate all types of data (structured/ puter science, applied mathematics and economics, a unstructured, etc.). It is comprised of two key modules: the language itself, called wide variety of techniques and technologies has been de- PigLatin,and theruntime versioninwhich veloped and adapted to aggregate, manipulate, analyze, thePigLatin codeisexecuted. and visualize big data in healthcare. Hive Hive is a runtime Hadoop support The most significant platform for big data analytics is architecture that leverages Structure Query the open-source distributed data processing platform Language (SQL) with the Hadoop platform. It permits SQL programmers to develop Hadoop (Apache platform), initially developed for such Hive Query Language (HQL) statements routine functions as aggregating web search indexes. It akin to typical SQL statements. belongs to the class “NoSQL” technologies—others in- Jaql Jaql is a functional, declarative query clude CouchDB and MongoDB—that evolved to aggre- language designed to process large data gate data in unique ways. Hadoop has the potential to sets. To facilitate parallel processing, Jaql converts “‘high-level’ queries into ‘low-level’ process extremely large amounts of data mainly by allo- queries” consisting of MapReduce tasks. cating partitioned data sets to numerous servers (nodes), Zookeeper Zookeeper allows a centralized each of which solves different parts of the larger prob- infrastructure with various services, lem and then integrates them for the final result [28-31]. providing synchronization across a cluster of servers. Big data analytics applications Hadoop can serve the twin roles of data organizer and utilize these services to coordinate parallel analytics tool. It offers a great deal of potential in enab- processing across big clusters. ling enterprises to harness the data that has been, until HBase HBase is a column-oriented database man- now, difficult to manage and analyze. Specifically, Hadoop agement system that sits on top of HDFS. It makes it possible to process extremely large volumes of uses a non-SQL approach. data with various structures or no structure at all. But Cassandra Cassandra is also a distributed database Hadoop can be challenging to install, configure and ad- system. It is designated as a top-level pro- ject modeled to handle big data distributed minister, and individuals with Hadoop skills are not easily across many utility servers. It also provides found. Furthermore, for these reasons, it appears organiza- reliable service with no particular point of tions are not quite ready to embrace Hadoop completely. failure (http://en.wikipedia.org/wiki/Apache_ Cassandra) and it is a NoSQL system. The surrounding ecosystem of additional platforms and Oozie Oozie, an open source project, streamlines tools supports the Hadoop distributed platform [30,31]. the workflow and coordination among the These are summarized in Table 1. tasks. Numerous vendors—including AWS, Cloudera, Lucene The Lucene project is used widely for text Hortonworks, and MapR Technologies—distribute open- analytics/searches and has been source Hadoop platforms [29]. Many proprietary options incorporated into several open source projects. Its scope includes full text indexing are also available, such as IBM’sBigInsights. Further, and library search for use within a Java many of these platforms are cloud versions, making them application. widely available. Cassandra, HBase, and MongoDB, de- Avro Avro facilitates data serialization services. scribed above, are used widely for the database compo- Versioning and version control are nent. While the available frameworks and tools are mostly additional useful features. open source and wrapped around Hadoop and related Mahout Mahout is yet another Apache project platforms, there are numerous trade-offs that devel- whose goal is to generate free applications of distributed and scalable machine opers and users of big data analytics in healthcare must learning algorithms that support big data consider. While the development costs may be lower analytics on the Hadoop platform. since these tools are open source and free of charge, the downsides are the lack of technical support and minimal Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 7 of 10 http://www.hissjournal.com/content/2/1/3 security. In the healthcare industry, these are, of course, addressed: What problem is being addressed? Why is it significant drawbacks, and therefore the trade-offs must be important and interesting to the healthcare provider? addressed. Additionally, these platforms/tools require a What is the case for a ‘big data’ analytics approach? great deal of programming, skills the typical end-user in (Because the complexity and cost of big data analytics healthcare may not possess. Furthermore, considering the are significantly higher compared to traditional analytics only recent emergence of big data analytics in healthcare, approaches, it is important to justify their use). The pro- governance issues including ownership, privacy, security, ject team also should provide background information on and standards have yet to be addressed. In the next section the problem domain as well as prior projects and research we offer an applied big data analytics in healthcare meth- done in this domain. odology to develop and implement a big data project for Next, in Step 3, the steps in the methodology are fleshed healthcare providers. out and implemented. The concept statement is broken down into a series of propositions. (Note these are not Methodology rigorous as they would be in the case of statistical ap- While several different methodologies are being developed proaches. Rather, they are developed to help guide the big in this rapidly emerging discipline, here we outline one data analytics process). Simultaneously, the independent that is practical and hands-on. Table 2 shows the main and dependent variables or indicators are identified. The stages of the methodology. In Step 1, the interdisciplinary data sources, as outlined in Figure 1, are also identified; big data analytics in healthcare team develops a ‘concept the data is collected, described, and transformed in prep- statement’. This is a first cut at establishing the need for aration for for analytics. A very important step at this such a project. The concept statement is followed by a de- point is platform/tool evaluation and selection. There are scription of the project’s significance. The healthcare several options available, as indicated previously, including organization will note that there are trade-offs in terms of AWS Hadoop, Cloudera, and IBM BigInsights. The next alternative options, cost, scalability, etc. Once the concept step is to apply the various big data analytics techniques statement is approved, the team can proceed to Step 2,the to the data. This process differs from routine analytics proposal development stage. Here, more details are filled only in that the techniques are scaled up to large data sets. in. Based on the concept statement, several questions are Through a series of iterations and what-if analyses, insight is gained from the big data analytics. From the insight, in- Table 2 Outline of big data analytics in healthcare formed decisions can be made. In Step 4, the models and methodology their findings are tested and validated and presented to Step 1 Concept statement stakeholders for action. Implementation is a staged ap- proach with feedback loops built in at each stage to • Establish need for big data analytics project in healthcare based on the “4Vs”. minimize risk of failure. Step 2 Proposal The next section describes several reported big data analytics applications in healthcare. We draw on publicly • What is the problem being addressed? available material from numerous sources, including • Why is it important and interesting? vendor sites. In this emerging discipline, there is little in- • Why big data analytics approach? dependent research to cite. These examples are from • Background material secondary sources. Nevertheless, they are illustrative of Step 3 Methodology the potential of big data analytics in healthcare. • Propositions Examples • Variable selection Premier, the U.S. healthcare alliance network, has more • Data collection than 2,700 members, hospitals and health systems, • ETL and data transformation 90,000 non-acute facilities and 400,000 physicians and is • Platform/tool selection reported to have data on approximately one in four pa- • Conceptual model tients discharged from hospitals. Naturally, the network • Analytic techniques has assembled a large database of clinical, financial, pa- tient, and supply chain data, with which the network has -Association, clustering, classification, etc. generated comprehensive and comparable clinical out- • Results & insight come measures, resource utilization reports and trans- Step 4 Deployment action level cost data. These outputs have informed • Evaluation & validation decision-making and improved the healthcare processes • Testing at approximately 330 hospitals, saving an estimated Source: Adapted from [Raghupathi & Raghupathi, [9]]. 29,000 lives and reducing healthcare spending by nearly Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 8 of 10 http://www.hissjournal.com/content/2/1/3 $7 billion [16]. North York General Hospital, a 450-bed (NICE) of the U.K.’s National Health Service. NICE is re- community teaching hospital in Toronto, Canada, reports portedly a leader in the analytics of large clinical datasets using real-time analytics to improve patient outcomes and for exploring the effectiveness of clinical and cost factors gain greater insight into the operations of healthcare deliv- in the use of new drugs and/or clinical treatments. The Italian Medicines Agency is also reported to collect and ery. North York is reported to have implemented a scal- able real-time analytics application to provide multiple analyze clinical data on the use of expensive new drugs as perspectives, including clinical, administrative, and finan- one goal in a country-level cost-effectiveness program [6]. Another leading example of big data analytics in health- cial [16]. Another example, reported by IBM, is that of the large, unnamed healthcare provider that is analyzing data care is the Department of Veterans Affairs’ (VA) use of ap- in the electronic medical record (EMR) system with the plications on its very large data set in an effort to comply with “performance-based accountability framework and goal of reducing costs and improving patient care. (Data in the EMR include the unstructured data from physician disease management practice” [6]. In one very famous ex- notes, pathology reports and other sources). Big data ana- ample, California-based Kaiser Permanente associated clinical data with cost data to generate a key data set, the lytics is used to develop care protocols and case pathways and to assist caregivers in performing customized queries analytics of which led to the discovery of adverse drug ef- [16]. Another example of big data analytics in healthcare fects and subsequent withdrawal of Vioxx from the mar- ket [6]. Researchers at the Johns Hopkins School of is Columbia University Medical Center’sanalysisof “com- plex correlations” of streams of physiological data related Medicine discovered they could use data from Google Flu to patients with brain injuries. The goal is to provide med- Trends to predict sudden increases in flu-related emer- gency room visits at least a week before warnings from ical professionals with critical and timely information to aggressively treat complications. The advanced analytics is the CDC. Likewise, the analysis of Twitter updates was as reported to diagnose serious complications as much as 48 accurate as (and two weeks ahead of) official reports at tracking the spread of cholera in Haiti after the January hours sooner than previously in patients who have suf- fered a bleeding stroke from a ruptured brain aneurysm 2010 earthquake [6]. Also reported is an application devel- [16]. The Rizzoli Orthopedic Institute in Bologna, Italy, is oped by IBM that predicts the likely outcomes of diabetes patients using patients’ panel data linked to physicians, reportedly using advanced analytics to gain a more “granular understanding” of the clinical variations within management protocols, and the overall relationship to families whereby individual patients display extreme dif- population health management averages [6]. In another dia- betes application, physicians at Harvard Medical School ferences in the severity of their symptoms. This insight is reported to have reduced annual hospitalizations by 30% and Harvard Pilgrim Health Care recently demonstrated and the number of imaging tests by 60%. In the long- the potential of analytics applications to EHR data to iden- term, the Institute expects to gain insight into the role of tify and group patients with diabetes for public health sur- genetic factors to develop treatments [16]. The Hospital veillance. Four years worth of data based on numerous for Sick Children (Sick Kids) in Toronto is using analytics indicators from multiple sources was utilized. The analyt- to improve the outcomes for infants prone to life- ics application also differentiated between Type 1 and threatening “nosocomial infections”.It isreportedthat Type II diabetes [6,26]. Finally, at Blue Cross Blue Shield Sick Kids applies advanced analytics to vital-sign data of Massachusetts (BCBSMA) there was a “need to embed gathered from bedside monitoring devices to identify po- analytics into business processes to help decision-makers tential signs infection as early as 24 hours prior to previous across the business gain insight into financial and medical methods [6,16]. Additional examples are reported below. data and become more proactive”. Several benefits were A recent New Yorker magazine article by Atul Gawande, reported. First, the analytics enabled medical directors to MD described how orthopedic surgeons at Brigham and identify high-risk disease groups and act to minimize risk Women’s Hospital in Boston relied on personal experi- and improve patient outcomes. For example, new pre- ence along with insight extracted from research on data ventive treatment protocols could be introduced among based on a host of factors critical to the success of joint- patient groups with high cholesterol, thereby fending off replacement surgery to systematically standardize knee heart problems. Also, complex health informatics re- joint-replacement surgery. The result: improved outcomes ports were generated 300% faster than previously, help- at lower costs. The University of Michigan Health System ing BCBSMA service clients more effectively [6]. standardized the administration of blood transfusions The next section briefly identifies some of the key using analytics in a similar fashion, combining experience challenges in big data analytics in healthcare. with big data analytics research. This resulted in a 31% re- duction in transfusions and $200,000 reduction in ex- Challenges penses per month (reported in [6]). Another example is At minimum, a big data analytics platform in healthcare The National Institute for Health and Clinical Excellence must support the key functions necessary for processing Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 9 of 10 http://www.hissjournal.com/content/2/1/3 the data. The criteria for platform evaluation may include 2. Burghard C: Big Data and Analytics Key to Accountable Care Success. IDC Health Insights; 2012. availability, continuity, ease of use, scalability, ability to 3. Dembosky A: “Data Prescription for Better Healthcare.” Financial Times, manipulate at different levels of granularity, privacy and December 12, 2012, p. 19; 2012. Available from: http://www.ft.com/intl/cms/ security enablement, and quality assurance [6,29,32]. In s/2/55cbca5a-4333-11e2-aa8f-00144feabdc0.html#axzz2W9cuwajK. 4. Feldman B, Martin EM, Skotnes T: “Big Data in Healthcare Hype and Hope.” addition, while most platforms currently available are October 2012. Dr. Bonnie 360; 2012. http://www.west-info.eu/files/big-data-in- open source, the typical advantages and limitations of healthcare.pdf. open source platforms apply. To succeed, big data analyt- 5. Fernandes L, O’Connor M, Weaver V: Big data, bigger outcomes. J AHIMA 2012:38–42. ics in healthcare needs to be packaged so it is menu- 6. IHTT: Transforming Health Care through Big Data Strategies for leveraging driven, user-friendly and transparent. Real-time big data big data in the health care industry; 2013. http://ihealthtran.com/ analytics is a key requirement in healthcare. The lag be- wordpress/2013/03/iht%C2%B2-releases-big-data-research-report- download-today/. tween data collection and processing has to be addressed. 7. Frost & Sullivan: Drowning in Big Data? Reducing Information Technology The dynamic availability of numerous analytics algo- Complexities and Costs for Healthcare Organizations. http://www.emc.com/ rithms, models and methods in a pull-down type of menu collateral/analyst-reports/frost-sullivan-reducing-information-technology- complexities-ar.pdf. is also necessary for large-scale adoption. The important 8. Bian J, Topaloglu U, Yu F, Yu F: Towards Large-scale Twitter Mining for Drug- managerial issues of ownership, governance and standards related Adverse Events. Maui, Hawaii: SHB; 2012. have to be considered. And woven through these issues 9. Raghupathi W, Raghupathi V: An Overview of Health Analytics. Working paper; 2013. are those of continuous data acquisition and data cleans- 10. Ikanow: Data Analytics for Healthcare: Creating Understanding from Big Data. ing. Health care data is rarely standardized, often fragmen- http://info.ikanow.com/Portals/163225/docs/data-analytics-for-healthcare.pdf. ted, or generated in legacy IT systems with incompatible 11. jStart: “How Big Data Analytics Reduced Medicaid Re-admissions.” A jStart Case Study; 2012. http://www-01.ibm.com/software/ebusiness/jstart/portfolio/ formats [6]. This great challenge needs to be addressed uncMedicaidCaseStudy.pdf. as well. 12. Knowledgent: Big Data and Healthcare Payers; 2013. http://knowledgent. com/mediapage/insights/whitepaper/482. 13. Explorys: Unlocking the Power of Big Data to Improve Healthcare for Everyone. Conclusions https://www.explorys.com/docs/data-sheets/explorys-overview.pdf. Big data analytics has the potential to transform the way 14. IBM: IBM big data platform for healthcare.” Solutions Brief; 2012. http://public. dhe.ibm.com/common/ssi/ecm/en/ims14398usen/IMS14398USEN.PDF. healthcare providers use sophisticated technologies to 15. Intel: Leveraging Big Data and Analytics in Healthcare and Life Sciences: gain insight from their clinical and other data repositor- Enabling Personalized Medicine for High-Quality Care, Better Outcomes; 2012. ies and make informed decisions. In the future we’ll see http://www.intel.com/content/dam/www/public/us/en/documents/white- papers/healthcare-leveraging-big-data-paper.pdf. the rapid, widespread implementation and use of big 16. IBM: Data Driven Healthcare Organizations Use Big Data Analytics for Big data analytics across the healthcare organization and the Gains; 2013. http://www03.ibm.com/industries/ca/en/healthcare/ healthcare industry. To that end, the several challenges documents/Data_driven_healthcare_organizations_use_big_data_analytics_ highlighted above, must be addressed. As big data analyt- for_big_gains.pdf. 17. Savage N: Digging for drug facts. Commun ACM 2012, 55(10):11–13. ics becomes more mainstream, issues such as guarantee- 18. Zenger B: “Can Big Data Solve Healthcare’s Big Problems?” HealthByte, ing privacy, safeguarding security, establishing standards February 2012; 2012. http://www.equityhealthcare.com/docstor/EH%20Blog% and governance, and continually improving the tools and 20on%20Analytics.pdf. 19. LaValle S, Lesser E, Shockley R, Hopkins MS, Kruschwitz N: Big data, technologies will garner attention. Big data analytics and analytics and the path from insights to value. MIT Sloan Manag Rev 2011, applications in healthcare are at a nascent stage of devel- 52:20–32. opment, but rapid advances in platforms and tools can ac- 20. Capgemini: The Deciding Factor: Big Data & Decision Making; 2013. http:// www.capgemini.com/thought-leadership/the-deciding-factor-big-data- celerate their maturing process. decision-making. 21. Connolly S, Wooledge S: Harnessing the Value of Big Data Analytics. Teradata; Competing interests We, the authors declare we have no competing interests. 22. Courtney M: Puzzling out big data. Engineering & Technology 2013:56–60. 23. Intel: Big Data Analytics; 2012. http://www.intel.com/content/dam/www/ Authors’ contributions public/us/en/documents/reports/data-insights-peer-research-report.pdf. Both WR and VR contributed equally. Both authors read and approved the 24. Manyika J, Chui M, Brown B, Buhin J, Dobbs R, Roxburgh C, Byers AH: Big final manuscript. Data: The Next Frontier for Innovation, Competition, and Productivity. USA: McKinsey Global Institute; 2011. Author details 25. IBM: Large Gene interaction Analytics at University at Buffalo, SUNY; 2012. Graduate School of Business, Fordham University, 113 W. 60th Street, 10023 http://public.dhe.ibm.com/common/ssi/ecm/en/imc14675usen/ New York, NY, USA. Brooklyn College, City University of New York, Brooklyn, IMC14675USEN.PDF. NY, USA. 26. IBM: Harvard Medical School; 2011. http://public.dhe.ibm.com/common/ssi/ ecm/en/imc14685usen/IMC14685USEN.PDF. Received: 27 August 2013 Accepted: 5 January 2014 27. Raghupathi W, Kesh S: Interoperable electronic health records Published: 7 February 2014 design: towards a service-oriented architecture. e-Service Journal 2007, 5:39–57. 28. Borkar VR, Carey MJ, Chen L: Big data platforms: what's next? ACM References Crossroads 2012, 19(1):44–49. 1. Raghupathi W: Data Mining in Health Care. In Healthcare Informatics: Improving Efficiency and Productivity. Edited by Kudyba S. Taylor & Francis; 29. Ohlhorst F: Big Data Analytics: Turning Big Data into Big Money. USA: John 2010:211–223. Wiley & Sons; 2012. Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 10 of 10 http://www.hissjournal.com/content/2/1/3 30. Zikopoulos PC, DeRoos D, Parasuraman K, Deutsch T, Corrigan D, Giles J: Harness the Power of Big Data. McGraw-Hill: The IBM Big Data Platform; 31. Zikopoulos PC, Eaton C, DeRoos D, Deutsch T, Lapis G: Understanding Big Data – Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill: Aspen Institute; 2012. 32. Bollier D: The Promise and Peril of Big Data. Washington, DC: The Aspen Institute; 2010. doi:10.1186/2047-2501-2-3 Cite this article as: Raghupathi and Raghupathi: Big data analytics in healthcare: promise and potential. Health Information Science and Systems 2014 2:3. Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Health Information Science and Systems Pubmed Central

Big data analytics in healthcare: promise and potential

Health Information Science and Systems , Volume 2 – Feb 7, 2014

Loading next page...
 
/lp/pubmed-central/big-data-analytics-in-healthcare-promise-and-potential-NWnFtKaP0b

References (44)

Publisher
Pubmed Central
Copyright
© Raghupathi and Raghupathi; licensee BioMed Central Ltd. 2014
eISSN
2047-2501
DOI
10.1186/2047-2501-2-3
Publisher site
See Article on Publisher Site

Abstract

Objective: To describe the promise and potential of big data analytics in healthcare. Methods: The paper describes the nascent field of big data analytics in healthcare, discusses the benefits, outlines an architectural framework and methodology, describes examples reported in the literature, briefly discusses the challenges, and offers conclusions. Results: The paper provides a broad overview of big data analytics for healthcare researchers and practitioners. Conclusions: Big data analytics in healthcare is evolving into a promising field for providing insight from very large data sets and improving outcomes while reducing costs. Its potential is great; however there remain challenges to overcome. Keywords: Big data, Analytics, Hadoop, Healthcare, Framework, Methodology Introduction (or impossible) to manage with traditional software and/ The healthcare industry historically has generated large or hardware; nor can they be easily managed with trad- amounts of data, driven by record keeping, compliance itional or common data management tools and methods & regulatory requirements, and patient care [1]. While [7]. Big data in healthcare is overwhelming not only be- most data is stored in hard copy form, the current trend is cause of its volume but also because of the diversity of toward rapid digitization of these large amounts of data. data types and the speed at which it must be managed [7]. Driven by mandatory requirements and the potential to The totality of data related to patient healthcare and well- improve the quality of healthcare delivery meanwhile re- being make up “big data” in the healthcare industry. It ducing the costs, these massive quantities of data (known includes clinical data from CPOE and clinical decision as ‘big data’) hold the promise of supporting a wide range support systems (physician’s written notes and prescrip- of medical and healthcare functions, including among tions, medical imaging, laboratory, pharmacy, insurance, others clinical decision support, disease surveillance, and other administrative data); patient data in electronic and population health management [2-5]. Reports say patient records (EPRs); machine generated/sensor data, data from the U.S. healthcare system alone reached, in such as from monitoring vital signs; social media posts, in- 2011, 150 exabytes. At this rate of growth, big data for U.S. cluding Twitter feeds (so-called tweets) [8], blogs [9], status healthcare will soon reach the zettabyte (10 gigabytes) updates on Facebook and other platforms, and web pages; scale and, not long after, the yottabyte (10 gigabytes) [6]. and less patient-specific information, including emergency Kaiser Permanente, the California-based health network, care data, news feeds, and articles in medical journals. which has more than 9 million members, is believed to For the big data scientist, there is, amongst this vast have between 26.5 and 44 petabytes of potentially rich amount and array of data, opportunity. By discovering data from EHRs, including images and annotations [6]. associations and understanding patterns and trends By definition, big data in healthcare refers to electronic within the data, big data analytics has the potential to health data sets so large and complex that they are difficult improve care, save lives and lower costs. Thus, big data analytics applications in healthcare take advantage of the * Correspondence: raghupathi@fordham.edu explosion in data to extract insights for making better Graduate School of Business, Fordham University, 113 W. 60th Street, 10023 informed decisions [10-12], and as a research category New York, NY, USA Full list of author information is available at the end of the article © 2014 Raghupathi and Raghupathi; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 2 of 10 http://www.hissjournal.com/content/2/1/3 are referred to as, no surprise here, big data analytics in for healthcare organizations to acquire the available healthcare [13-15]. When big data is synthesized and an- tools, infrastructure, and techniques to leverage big data alyzed—and those aforementioned associations, patterns effectively or else risk losing potentially millions of dol- and trends revealed—healthcare providers and other lars in revenue and profits [19]. stakeholders in the healthcare delivery system can de- What exactly is big data? A report delivered to the U.S. velop more thorough and insightful diagnoses and treat- Congress in August 2012 defines big data as “large vol- ments, resulting, one would expect, in higher quality umes of high velocity, complex, and variable data that re- care at lower costs and in better outcomes overall [12]. quire advanced techniques and technologies to enable the The potential for big data analytics in healthcare to lead capture, storage, distribution, management and analysis of to better outcomes exists across many scenarios, for ex- the information” [6]. Big data encompasses such charac- ample: by analyzing patient characteristics and the cost teristics as variety, velocity and, with respect specifically to and outcomes of care to identify the most clinically and healthcare, veracity [20-23]. Existing analytical techniques cost effective treatments and offer analysis and tools, can be applied to the vast amount of existing (but cur- thereby influencing provider behavior; applying ad- rently unanalyzed) patient-related health and medical data vanced analytics to patient profiles (e.g., segmentation to reach a deeper understanding of outcomes, which then and predictive modeling) to proactively identify individ- can be applied at the point of care. Ideally, individual and uals who would benefit from preventative care or life- population data would inform each physician and her style changes; broad scale disease profiling to identify patient during the decision-making process and help de- predictive events and support prevention initiatives; col- terminethe most appropriatetreatment option for that lecting and publishing data on medical procedures, thus particular patient. assisting patients in determining the care protocols or regimens that offer the best value; identifying, predicting Advantages to healthcare and minimizing fraud by implementing advanced ana- By digitizing, combining and effectively using big data, lytic systems for fraud detection and checking the accur- healthcare organizations ranging from single-physician acy and consistency of claims; and, implementing much offices and multi-provider groups to large hospital net- nearer to real-time, claim authorization; creating new works and accountable care organizations stand to revenue streams by aggregating and synthesizing patient realize significant benefits [2]. Potential benefits include clinical records and claims data sets to provide data and detecting diseases at earlier stages when they can be services to third parties, for example, licensing data to treated more easily and effectively; managing specific in- assist pharmaceutical companies in identifying patients dividual and population health and detecting health care for inclusion in clinical trials. Many payers are develop- fraud more quickly and efficiently. Numerous questions ing and deploying mobile apps that help patients manage can be addressed with big data analytics. Certain devel- their care, locate providers and improve their health. Via opments or outcomes may be predicted and/or esti- analytics, payers are able to monitor adherence to drug mated based on vast amounts of historical data, such as and treatment regimens and detect trends that lead to length of stay (LOS); patients who will choose elective individual and population wellness benefits [12,16-18]. surgery; patients who likely will not benefit from surgery; This article provides an overview of big data analytics complications; patients at risk for medical complications; in healthcare as it is emerging as a discipline. First, we patients at risk for sepsis, MRSA, C. difficile, or other define and discuss the various advantages and character- hospital-acquired illness; illness/disease progression; pa- istics of big data analytics in healthcare. Then we de- tients at risk for advancement in disease states; causal scribe the architectural framework of big data analytics factors of illness/disease progression; and possible co- in healthcare. Third, the big data analytics application morbid conditions (EMC Consulting). McKinsey esti- development methodology is described. Fourth, we pro- mates that big data analytics can enable more than $300 vide examples of big data analytics in healthcare reported billion in savings per year in U.S. healthcare, two thirds in the literature. Fifth, the challenges are identified. Lastly, of that through reductions of approximately 8% in na- we offer conclusions and future directions. tional healthcare expenditures. Clinical operations and R & D are two of the largest areas for potential savings Big data analytics in healthcare with $165 billion and $108 billion in waste respectively Health data volume is expected to grow dramatically in [24]. McKinsey believes big data could help reduce waste the years ahead [6]. In addition, healthcare reimburse- and inefficiency in the following three areas: ment models are changing; meaningful use and pay for performance are emerging as critical new factors in to- Clinical operations: Comparative effectiveness day’s healthcare environment. Although profit is not and research to determine more clinically relevant and should not be a primary motivator, it is vitally important cost-effective ways to diagnose and treat patients. Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 3 of 10 http://www.hissjournal.com/content/2/1/3 Research & development: 1) predictive modeling to and processes that do not deliver demonstrable benefits or cost too much; reducing readmissions by identifying lower attrition and produce a leaner, faster, more targeted R & D pipeline in drugs and devices; environmental or lifestyle factors that increase risk or trig- 2) statistical tools and algorithms to improve clinical ger adverse events [26] and adjusting treatment plans ac- cordingly; improving outcomes by examining vitals from trial design and patient recruitment to better match treatments to individual patients, thus reducing trial at-home health monitors; managing population health by failures and speeding new treatments to market; and detecting vulnerabilities within patient populations during disease outbreaks or disasters; and bringing clinical, finan- 3) analyzing clinical trials and patient records to identify follow-on indications and discover adverse effects before cial and operational data together to analyze resource products reach the market. utilization productively and in real time [16]. Public health: 1) analyzing disease patterns and tracking disease outbreaks and transmission to improve public health surveillance and speed response; 2) faster The 4 “Vs” of big data analytics in healthcare development of more accurately targeted vaccines, e.g., Like big data in healthcare, the analytics associated with choosing the annual influenza strains; and, 3) turning big data is described by three primary characteristics: large amounts of data into actionable information that volume, velocity and variety (http://www-01.ibm.com/soft can be used to identify needs, provide services, and ware/data/bigdata/). Over time, health-related data will be predict and prevent crises, especially for the benefit of created and accumulated continuously, resulting in an in- populations [24]. credible volume of data. The already daunting volume of In addition, [14] suggests big data analytics in existing healthcare data includes personal medical records, healthcare can contribute to radiology images, clinical trial data FDA submissions, hu- Evidence-based medicine: Combine and analyze a man genetics and population data genomic sequences, etc. variety of structured and unstructured data-EMRs, Newer forms of big data, such as 3D imaging, genomics financial and operational data, clinical data, and genomic and biometric sensor readings, are also fueling this ex- data to match treatments with outcomes, predict patients ponential growth. at risk for disease or readmission and provide more Fortunately, advances in data management, particu- efficient care; larly virtualization and cloud computing, are facilitating Genomic analytics: Execute gene sequencing more the development of platforms for more effective capture, efficiently and cost effectively and make genomic storage and manipulation of large volumes of data [4]. analysis a part of the regular medical care decision Data is accumulated in real-time and at a rapid pace, or process and the growing patient medical record [25]; velocity. The constant flow of new data accumulating at Pre-adjudication fraud analysis: Rapidly analyze unprecedented rates presents new challenges. Just as the large numbers of claim requests to reduce fraud, waste volume and variety of data that is collected and stored and abuse; has changed, so too has the velocity at which it is gener- Device/remote monitoring: Capture and analyze in ated and that is necessary for retrieving, analyzing, com- real-time large volumes of fast-moving data from paring and making decisions based on the output. in-hospital and in-home devices, for safety monitoring Most healthcare data has been traditionally static—paper and adverse event prediction; files, x-ray films, and scripts. Velocity of mounting data in- Patient profile analytics: Apply advanced analytics creases with data that represents regular monitoring, such to patient profiles (e.g., segmentation and predictive as multiple daily diabetic glucose measurements (or more modeling) to identify individuals who would benefit continuous control by insulin pumps), blood pressure from proactive care or lifestyle changes, for example, readings, and EKGs. Meanwhile, in many medical situa- those patients at risk of developing a specific disease tions, constant real-time data (trauma monitoring for (e.g., diabetes) who would benefit from preventive blood pressure, operating room monitors for anesthesia, care [14]. bedside heart monitors, etc.) can mean the difference be- tween life and death. According to [16], areas in which enhanced data and Future applications of real-time data, such as detecting analytics yield the greatest results include: pinpointing infections as early as possible, identifying them swiftly patients who are the greatest consumers of health re- and applying the right treatments (not just broad-spectrum sources or at the greatest risk for adverse outcomes; pro- antibiotics) could reduce patient morbidity and mortality viding individuals with the information they need to and even prevent hospital outbreaks. Already, real-time make informed decisions and more effectively manage streaming data monitors neonates in the ICU, catching their own health as well as more easily adopt and track life-threatening infections sooner [6]. The ability to per- healthier behaviors; identifying treatments, programs form real-time analytics against such high-volume data in Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 4 of 10 http://www.hissjournal.com/content/2/1/3 motion and across all specialties would revolutionize past and, more importantly, expedite distribution to the healthcare [4]. Therein lies variety. right patients [4]. The prospects for all areas of health- As the nature of health data has evolved, so too have care are infinite. analytics techniques scaled up to the complex and so- Some practitioners and researchers have introduced a fourth characteristic, veracity, or ‘data assurance’. That phisticated analytics necessary to accommodate volume, velocity and variety. Gone are the days of data collected is, the big data, analytics and outcomes are error-free exclusively in electronic health records and other struc- and credible. Of course, veracity is the goal, not (yet) the reality. Data quality issues are of acute concern in tured formats. Increasingly, the data is in multimedia format and unstructured. The enormous variety of data— healthcare for two reasons: life or death decisions de- structured, unstructured and semi-structured—is a di- pend on having the accurate information, and the quality of healthcare data, especially unstructured data, is highly mension that makes healthcare data both interesting and challenging. variable and all too often incorrect. (Inaccurate “transla- Structured data is data that can be easily stored, quer- tions” of poor handwriting on prescriptions are perhaps the most infamous example). ied, recalled, analyzed and manipulated by machine. His- torically, in healthcare, structured and semi-structured Veracity assumes the simultaneous scaling up in granu- data includes instrument readings and data generated by larity and performance of the architectures and plat- forms, algorithms, methodologies and tools to match the ongoing conversion of paper records to electronic health and medical records. Historically, the point of the demands of big data. The analytics architectures care generated unstructured data: office medical records, and tools for structured and unstructured big data are very different from traditional business intelligence (BI) handwritten nurse and doctor notes, hospital admission and discharge records, paper prescriptions, radiograph tools. They are necessarily of industrial strength. For ex- films, MRI, CT and other images. ample, big data analytics in healthcare would be exe- cuted in distributed processing across several servers Already, new data streams—structured and unstruc- tured—are cascading into the healthcare realm from fit- (“nodes”), utilizing the paradigm of parallel computing ness devices, genetics and genomics, social media and ‘divide and process’ approach. Likewise, models and research and other sources. But relatively little of this techniques—such as data mining and statistical approaches, data can presently be captured, stored and organized so algorithms, visualization techniques—need to take into ac- that it can be manipulated by computers and analyzed count the characteristics of big data analytics. Traditional for useful information. Healthcare applications in par- data management assumes that the warehoused data is ticular need more efficient ways to combine and convert certain, clean, and precise. varieties of data including automating conversion from Veracity in healthcare data faces many of the same is- structured to unstructured data. sues as in financial data, especially on the payer side: Is The structured data in EMRs and EHRs include famil- this the correct patient/hospital/payer/reimbursement iar input record fields such as patient name, data of code/dollar amount? Other veracity issues are unique to birth, address, physician’s name, hospital name and ad- healthcare: Are diagnoses/treatments/prescriptions/proce- dress, treatment reimbursement codes, and other infor- dures/outcomes captured correctly? mation easily coded into and handled by automated Improving coordination of care, avoiding errors and databases. The need to field-code data at the point of reducing costs depend on high-quality data, as do ad- care for electronic handling is a major barrier to accept- vances in drug safety and efficacy, diagnostic accuracy ance of EMRs by physicians and nurses, who lose the and more precise targeting of disease processes by treat- natural language ease of entry and understanding that ments. But increased variety and high velocity hinder handwritten notes provide. On the other hand, most the ability to cleanse data before analyzing it and making providers agree that an easy way to reduce prescription decisions, magnifying the issue of data “trust” [4]. errors is to use digital entries rather than handwritten The ‘4Vs’ are an appropriate starting point for a scripts. discussion about big data analytics in healthcare. But The potential of big data in healthcare lies in combin- there are other issues to consider, such as the num- ing traditional data with new forms of data, both indi- ber of architectures and platforms, and the domin- vidually and on a population level. We are already seeing ance of the open source paradigm in the availability data sets from a multitude of sources support faster and of tools. Consider, too, the challenge of developing more reliable research and discovery. If, for example, methodologies and the need for user-friendly inter- pharmaceutical developers could integrate population faces. While the overall cost of hardware and software clinical data sets with genomics data, this development is declining, these issues have to be addressed to har- could facilitate those developers gaining approvals on ness and maximize the potential of big data analytics more and better drug therapies more quickly than in the in healthcare. Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 5 of 10 http://www.hissjournal.com/content/2/1/3 Architectural framework tables, ASCII/text, etc.) and residing at multiple locations The conceptual framework for a big data analytics pro- (geographic as well as in different healthcare providers’ ject in healthcare is similar to that of a traditional health sites) in numerous legacy and other applications (transac- informatics or analytics project. The key difference lies tion processing applications, databases, etc.). Sources and in how processing is executed. In a regular health analyt- data types include: ics project, the analysis can be performed with a busi- ness intelligence tool installed on a stand-alone system, 1. Web and social media data: Clickstream and such as a desktop or laptop. Because big data is by defin- interaction data from Facebook, Twitter, LinkedIn, ition large, processing is broken down and executed blogs, and the like. It can also include health plan across multiple nodes. The concept of distributed pro- websites, smartphone apps, etc. [6]. cessing has existed for decades. What is relatively new is 2. Machine to machine data: readings from remote its use in analyzing very large data sets as healthcare sensors, meters, and other vital sign devices [6]. providers start to tap into their large data repositories to 3. Big transaction data: health care claims and other gain insight for making better-informed health-related billing records increasingly available in semi-structured decisions. Furthermore, open source platforms such as and unstructured formats [6]. Hadoop/MapReduce, available on the cloud, have encour- 4. Biometric data: finger prints, genetics, handwriting, aged the application of big data analytics in healthcare. retinal scans, x-ray and other medical images, blood While the algorithms and models are similar, the user pressure, pulse and pulse-oximetry readings, and interfaces of traditional analytics tools and those used other similar types of data [6]. for big data are entirely different; traditional health ana- 5. Human-generated data: unstructured and lytics tools have become very user friendly and transpar- semi-structured data such as EMRs, physicians ent. Big data analytics tools, on the other hand, are notes, email, and paper documents [6]. extremely complex, programming intensive, and require the application of a variety of skills. They have emerged For the purpose of big data analytics, this data has to in an ad hoc fashion mostly as open-source development be pooled. In the second component the data is in a tools and platforms, and therefore they lack the support ‘raw’ state and needs to be processed or transformed, at and user-friendliness that vendor-driven proprietary which point several options are available. A service- tools possess. As Figure 1 indicates, the complexity be- oriented architectural approach combined with web ser- gins with the data itself. vices (middleware) is one possibility [27]. The data stays Big data in healthcare can come from internal (e.g., elec- raw and services are used to call, retrieve and process tronic health records, clinical decision support systems, the data. Another approach is data warehousing wherein CPOE, etc.) and external sources (government sources, la- data from various sources is aggregated and made ready boratories, pharmacies, insurance companies & HMOs, for processing, although the data is not available in real- etc.), often in multiple formats (flat files, .csv, relational time. Via the steps of extract, transform, and load (ETL), Figure 1 An applied conceptual architecture of big data analytics. Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 6 of 10 http://www.hissjournal.com/content/2/1/3 Table 1 Platforms & tools for big data analytics in data from diverse sources is cleansed and readied. De- healthcare pending on whether the data is structured or unstruc- Platform/Tool Description tured, several data formats can be input to the big data analytics platform. The Hadoop Distributed HDFS enables the underlying storage for File System (HDFS) the Hadoop cluster. It divides the data into In this next component in the conceptual framework, smaller parts and distributes it across the several decisions are made regarding the data input ap- various servers/nodes. proach, distributed design, tool selection and analytics MapReduce MapReduce provides the interface for the models. Finally, on the far right, the four typical applica- distribution of sub-tasks and the gathering tions of big data analytics in healthcare are shown. of outputs. When tasks are executed, MapReduce tracks the processing of each These include queries, reports, OLAP, and data mining. server/node. Visualization is an overarching theme across the four ap- PIG and PIG Latin Pig programming language is configured plications. Drawing from such fields as statistics, com- (Pig and PigLatin) to assimilate all types of data (structured/ puter science, applied mathematics and economics, a unstructured, etc.). It is comprised of two key modules: the language itself, called wide variety of techniques and technologies has been de- PigLatin,and theruntime versioninwhich veloped and adapted to aggregate, manipulate, analyze, thePigLatin codeisexecuted. and visualize big data in healthcare. Hive Hive is a runtime Hadoop support The most significant platform for big data analytics is architecture that leverages Structure Query the open-source distributed data processing platform Language (SQL) with the Hadoop platform. It permits SQL programmers to develop Hadoop (Apache platform), initially developed for such Hive Query Language (HQL) statements routine functions as aggregating web search indexes. It akin to typical SQL statements. belongs to the class “NoSQL” technologies—others in- Jaql Jaql is a functional, declarative query clude CouchDB and MongoDB—that evolved to aggre- language designed to process large data gate data in unique ways. Hadoop has the potential to sets. To facilitate parallel processing, Jaql converts “‘high-level’ queries into ‘low-level’ process extremely large amounts of data mainly by allo- queries” consisting of MapReduce tasks. cating partitioned data sets to numerous servers (nodes), Zookeeper Zookeeper allows a centralized each of which solves different parts of the larger prob- infrastructure with various services, lem and then integrates them for the final result [28-31]. providing synchronization across a cluster of servers. Big data analytics applications Hadoop can serve the twin roles of data organizer and utilize these services to coordinate parallel analytics tool. It offers a great deal of potential in enab- processing across big clusters. ling enterprises to harness the data that has been, until HBase HBase is a column-oriented database man- now, difficult to manage and analyze. Specifically, Hadoop agement system that sits on top of HDFS. It makes it possible to process extremely large volumes of uses a non-SQL approach. data with various structures or no structure at all. But Cassandra Cassandra is also a distributed database Hadoop can be challenging to install, configure and ad- system. It is designated as a top-level pro- ject modeled to handle big data distributed minister, and individuals with Hadoop skills are not easily across many utility servers. It also provides found. Furthermore, for these reasons, it appears organiza- reliable service with no particular point of tions are not quite ready to embrace Hadoop completely. failure (http://en.wikipedia.org/wiki/Apache_ Cassandra) and it is a NoSQL system. The surrounding ecosystem of additional platforms and Oozie Oozie, an open source project, streamlines tools supports the Hadoop distributed platform [30,31]. the workflow and coordination among the These are summarized in Table 1. tasks. Numerous vendors—including AWS, Cloudera, Lucene The Lucene project is used widely for text Hortonworks, and MapR Technologies—distribute open- analytics/searches and has been source Hadoop platforms [29]. Many proprietary options incorporated into several open source projects. Its scope includes full text indexing are also available, such as IBM’sBigInsights. Further, and library search for use within a Java many of these platforms are cloud versions, making them application. widely available. Cassandra, HBase, and MongoDB, de- Avro Avro facilitates data serialization services. scribed above, are used widely for the database compo- Versioning and version control are nent. While the available frameworks and tools are mostly additional useful features. open source and wrapped around Hadoop and related Mahout Mahout is yet another Apache project platforms, there are numerous trade-offs that devel- whose goal is to generate free applications of distributed and scalable machine opers and users of big data analytics in healthcare must learning algorithms that support big data consider. While the development costs may be lower analytics on the Hadoop platform. since these tools are open source and free of charge, the downsides are the lack of technical support and minimal Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 7 of 10 http://www.hissjournal.com/content/2/1/3 security. In the healthcare industry, these are, of course, addressed: What problem is being addressed? Why is it significant drawbacks, and therefore the trade-offs must be important and interesting to the healthcare provider? addressed. Additionally, these platforms/tools require a What is the case for a ‘big data’ analytics approach? great deal of programming, skills the typical end-user in (Because the complexity and cost of big data analytics healthcare may not possess. Furthermore, considering the are significantly higher compared to traditional analytics only recent emergence of big data analytics in healthcare, approaches, it is important to justify their use). The pro- governance issues including ownership, privacy, security, ject team also should provide background information on and standards have yet to be addressed. In the next section the problem domain as well as prior projects and research we offer an applied big data analytics in healthcare meth- done in this domain. odology to develop and implement a big data project for Next, in Step 3, the steps in the methodology are fleshed healthcare providers. out and implemented. The concept statement is broken down into a series of propositions. (Note these are not Methodology rigorous as they would be in the case of statistical ap- While several different methodologies are being developed proaches. Rather, they are developed to help guide the big in this rapidly emerging discipline, here we outline one data analytics process). Simultaneously, the independent that is practical and hands-on. Table 2 shows the main and dependent variables or indicators are identified. The stages of the methodology. In Step 1, the interdisciplinary data sources, as outlined in Figure 1, are also identified; big data analytics in healthcare team develops a ‘concept the data is collected, described, and transformed in prep- statement’. This is a first cut at establishing the need for aration for for analytics. A very important step at this such a project. The concept statement is followed by a de- point is platform/tool evaluation and selection. There are scription of the project’s significance. The healthcare several options available, as indicated previously, including organization will note that there are trade-offs in terms of AWS Hadoop, Cloudera, and IBM BigInsights. The next alternative options, cost, scalability, etc. Once the concept step is to apply the various big data analytics techniques statement is approved, the team can proceed to Step 2,the to the data. This process differs from routine analytics proposal development stage. Here, more details are filled only in that the techniques are scaled up to large data sets. in. Based on the concept statement, several questions are Through a series of iterations and what-if analyses, insight is gained from the big data analytics. From the insight, in- Table 2 Outline of big data analytics in healthcare formed decisions can be made. In Step 4, the models and methodology their findings are tested and validated and presented to Step 1 Concept statement stakeholders for action. Implementation is a staged ap- proach with feedback loops built in at each stage to • Establish need for big data analytics project in healthcare based on the “4Vs”. minimize risk of failure. Step 2 Proposal The next section describes several reported big data analytics applications in healthcare. We draw on publicly • What is the problem being addressed? available material from numerous sources, including • Why is it important and interesting? vendor sites. In this emerging discipline, there is little in- • Why big data analytics approach? dependent research to cite. These examples are from • Background material secondary sources. Nevertheless, they are illustrative of Step 3 Methodology the potential of big data analytics in healthcare. • Propositions Examples • Variable selection Premier, the U.S. healthcare alliance network, has more • Data collection than 2,700 members, hospitals and health systems, • ETL and data transformation 90,000 non-acute facilities and 400,000 physicians and is • Platform/tool selection reported to have data on approximately one in four pa- • Conceptual model tients discharged from hospitals. Naturally, the network • Analytic techniques has assembled a large database of clinical, financial, pa- tient, and supply chain data, with which the network has -Association, clustering, classification, etc. generated comprehensive and comparable clinical out- • Results & insight come measures, resource utilization reports and trans- Step 4 Deployment action level cost data. These outputs have informed • Evaluation & validation decision-making and improved the healthcare processes • Testing at approximately 330 hospitals, saving an estimated Source: Adapted from [Raghupathi & Raghupathi, [9]]. 29,000 lives and reducing healthcare spending by nearly Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 8 of 10 http://www.hissjournal.com/content/2/1/3 $7 billion [16]. North York General Hospital, a 450-bed (NICE) of the U.K.’s National Health Service. NICE is re- community teaching hospital in Toronto, Canada, reports portedly a leader in the analytics of large clinical datasets using real-time analytics to improve patient outcomes and for exploring the effectiveness of clinical and cost factors gain greater insight into the operations of healthcare deliv- in the use of new drugs and/or clinical treatments. The Italian Medicines Agency is also reported to collect and ery. North York is reported to have implemented a scal- able real-time analytics application to provide multiple analyze clinical data on the use of expensive new drugs as perspectives, including clinical, administrative, and finan- one goal in a country-level cost-effectiveness program [6]. Another leading example of big data analytics in health- cial [16]. Another example, reported by IBM, is that of the large, unnamed healthcare provider that is analyzing data care is the Department of Veterans Affairs’ (VA) use of ap- in the electronic medical record (EMR) system with the plications on its very large data set in an effort to comply with “performance-based accountability framework and goal of reducing costs and improving patient care. (Data in the EMR include the unstructured data from physician disease management practice” [6]. In one very famous ex- notes, pathology reports and other sources). Big data ana- ample, California-based Kaiser Permanente associated clinical data with cost data to generate a key data set, the lytics is used to develop care protocols and case pathways and to assist caregivers in performing customized queries analytics of which led to the discovery of adverse drug ef- [16]. Another example of big data analytics in healthcare fects and subsequent withdrawal of Vioxx from the mar- ket [6]. Researchers at the Johns Hopkins School of is Columbia University Medical Center’sanalysisof “com- plex correlations” of streams of physiological data related Medicine discovered they could use data from Google Flu to patients with brain injuries. The goal is to provide med- Trends to predict sudden increases in flu-related emer- gency room visits at least a week before warnings from ical professionals with critical and timely information to aggressively treat complications. The advanced analytics is the CDC. Likewise, the analysis of Twitter updates was as reported to diagnose serious complications as much as 48 accurate as (and two weeks ahead of) official reports at tracking the spread of cholera in Haiti after the January hours sooner than previously in patients who have suf- fered a bleeding stroke from a ruptured brain aneurysm 2010 earthquake [6]. Also reported is an application devel- [16]. The Rizzoli Orthopedic Institute in Bologna, Italy, is oped by IBM that predicts the likely outcomes of diabetes patients using patients’ panel data linked to physicians, reportedly using advanced analytics to gain a more “granular understanding” of the clinical variations within management protocols, and the overall relationship to families whereby individual patients display extreme dif- population health management averages [6]. In another dia- betes application, physicians at Harvard Medical School ferences in the severity of their symptoms. This insight is reported to have reduced annual hospitalizations by 30% and Harvard Pilgrim Health Care recently demonstrated and the number of imaging tests by 60%. In the long- the potential of analytics applications to EHR data to iden- term, the Institute expects to gain insight into the role of tify and group patients with diabetes for public health sur- genetic factors to develop treatments [16]. The Hospital veillance. Four years worth of data based on numerous for Sick Children (Sick Kids) in Toronto is using analytics indicators from multiple sources was utilized. The analyt- to improve the outcomes for infants prone to life- ics application also differentiated between Type 1 and threatening “nosocomial infections”.It isreportedthat Type II diabetes [6,26]. Finally, at Blue Cross Blue Shield Sick Kids applies advanced analytics to vital-sign data of Massachusetts (BCBSMA) there was a “need to embed gathered from bedside monitoring devices to identify po- analytics into business processes to help decision-makers tential signs infection as early as 24 hours prior to previous across the business gain insight into financial and medical methods [6,16]. Additional examples are reported below. data and become more proactive”. Several benefits were A recent New Yorker magazine article by Atul Gawande, reported. First, the analytics enabled medical directors to MD described how orthopedic surgeons at Brigham and identify high-risk disease groups and act to minimize risk Women’s Hospital in Boston relied on personal experi- and improve patient outcomes. For example, new pre- ence along with insight extracted from research on data ventive treatment protocols could be introduced among based on a host of factors critical to the success of joint- patient groups with high cholesterol, thereby fending off replacement surgery to systematically standardize knee heart problems. Also, complex health informatics re- joint-replacement surgery. The result: improved outcomes ports were generated 300% faster than previously, help- at lower costs. The University of Michigan Health System ing BCBSMA service clients more effectively [6]. standardized the administration of blood transfusions The next section briefly identifies some of the key using analytics in a similar fashion, combining experience challenges in big data analytics in healthcare. with big data analytics research. This resulted in a 31% re- duction in transfusions and $200,000 reduction in ex- Challenges penses per month (reported in [6]). Another example is At minimum, a big data analytics platform in healthcare The National Institute for Health and Clinical Excellence must support the key functions necessary for processing Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 9 of 10 http://www.hissjournal.com/content/2/1/3 the data. The criteria for platform evaluation may include 2. Burghard C: Big Data and Analytics Key to Accountable Care Success. IDC Health Insights; 2012. availability, continuity, ease of use, scalability, ability to 3. Dembosky A: “Data Prescription for Better Healthcare.” Financial Times, manipulate at different levels of granularity, privacy and December 12, 2012, p. 19; 2012. Available from: http://www.ft.com/intl/cms/ security enablement, and quality assurance [6,29,32]. In s/2/55cbca5a-4333-11e2-aa8f-00144feabdc0.html#axzz2W9cuwajK. 4. Feldman B, Martin EM, Skotnes T: “Big Data in Healthcare Hype and Hope.” addition, while most platforms currently available are October 2012. Dr. Bonnie 360; 2012. http://www.west-info.eu/files/big-data-in- open source, the typical advantages and limitations of healthcare.pdf. open source platforms apply. To succeed, big data analyt- 5. Fernandes L, O’Connor M, Weaver V: Big data, bigger outcomes. J AHIMA 2012:38–42. ics in healthcare needs to be packaged so it is menu- 6. IHTT: Transforming Health Care through Big Data Strategies for leveraging driven, user-friendly and transparent. Real-time big data big data in the health care industry; 2013. http://ihealthtran.com/ analytics is a key requirement in healthcare. The lag be- wordpress/2013/03/iht%C2%B2-releases-big-data-research-report- download-today/. tween data collection and processing has to be addressed. 7. Frost & Sullivan: Drowning in Big Data? Reducing Information Technology The dynamic availability of numerous analytics algo- Complexities and Costs for Healthcare Organizations. http://www.emc.com/ rithms, models and methods in a pull-down type of menu collateral/analyst-reports/frost-sullivan-reducing-information-technology- complexities-ar.pdf. is also necessary for large-scale adoption. The important 8. Bian J, Topaloglu U, Yu F, Yu F: Towards Large-scale Twitter Mining for Drug- managerial issues of ownership, governance and standards related Adverse Events. Maui, Hawaii: SHB; 2012. have to be considered. And woven through these issues 9. Raghupathi W, Raghupathi V: An Overview of Health Analytics. Working paper; 2013. are those of continuous data acquisition and data cleans- 10. Ikanow: Data Analytics for Healthcare: Creating Understanding from Big Data. ing. Health care data is rarely standardized, often fragmen- http://info.ikanow.com/Portals/163225/docs/data-analytics-for-healthcare.pdf. ted, or generated in legacy IT systems with incompatible 11. jStart: “How Big Data Analytics Reduced Medicaid Re-admissions.” A jStart Case Study; 2012. http://www-01.ibm.com/software/ebusiness/jstart/portfolio/ formats [6]. This great challenge needs to be addressed uncMedicaidCaseStudy.pdf. as well. 12. Knowledgent: Big Data and Healthcare Payers; 2013. http://knowledgent. com/mediapage/insights/whitepaper/482. 13. Explorys: Unlocking the Power of Big Data to Improve Healthcare for Everyone. Conclusions https://www.explorys.com/docs/data-sheets/explorys-overview.pdf. Big data analytics has the potential to transform the way 14. IBM: IBM big data platform for healthcare.” Solutions Brief; 2012. http://public. dhe.ibm.com/common/ssi/ecm/en/ims14398usen/IMS14398USEN.PDF. healthcare providers use sophisticated technologies to 15. Intel: Leveraging Big Data and Analytics in Healthcare and Life Sciences: gain insight from their clinical and other data repositor- Enabling Personalized Medicine for High-Quality Care, Better Outcomes; 2012. ies and make informed decisions. In the future we’ll see http://www.intel.com/content/dam/www/public/us/en/documents/white- papers/healthcare-leveraging-big-data-paper.pdf. the rapid, widespread implementation and use of big 16. IBM: Data Driven Healthcare Organizations Use Big Data Analytics for Big data analytics across the healthcare organization and the Gains; 2013. http://www03.ibm.com/industries/ca/en/healthcare/ healthcare industry. To that end, the several challenges documents/Data_driven_healthcare_organizations_use_big_data_analytics_ highlighted above, must be addressed. As big data analyt- for_big_gains.pdf. 17. Savage N: Digging for drug facts. Commun ACM 2012, 55(10):11–13. ics becomes more mainstream, issues such as guarantee- 18. Zenger B: “Can Big Data Solve Healthcare’s Big Problems?” HealthByte, ing privacy, safeguarding security, establishing standards February 2012; 2012. http://www.equityhealthcare.com/docstor/EH%20Blog% and governance, and continually improving the tools and 20on%20Analytics.pdf. 19. LaValle S, Lesser E, Shockley R, Hopkins MS, Kruschwitz N: Big data, technologies will garner attention. Big data analytics and analytics and the path from insights to value. MIT Sloan Manag Rev 2011, applications in healthcare are at a nascent stage of devel- 52:20–32. opment, but rapid advances in platforms and tools can ac- 20. Capgemini: The Deciding Factor: Big Data & Decision Making; 2013. http:// www.capgemini.com/thought-leadership/the-deciding-factor-big-data- celerate their maturing process. decision-making. 21. Connolly S, Wooledge S: Harnessing the Value of Big Data Analytics. Teradata; Competing interests We, the authors declare we have no competing interests. 22. Courtney M: Puzzling out big data. Engineering & Technology 2013:56–60. 23. Intel: Big Data Analytics; 2012. http://www.intel.com/content/dam/www/ Authors’ contributions public/us/en/documents/reports/data-insights-peer-research-report.pdf. Both WR and VR contributed equally. Both authors read and approved the 24. Manyika J, Chui M, Brown B, Buhin J, Dobbs R, Roxburgh C, Byers AH: Big final manuscript. Data: The Next Frontier for Innovation, Competition, and Productivity. USA: McKinsey Global Institute; 2011. Author details 25. IBM: Large Gene interaction Analytics at University at Buffalo, SUNY; 2012. Graduate School of Business, Fordham University, 113 W. 60th Street, 10023 http://public.dhe.ibm.com/common/ssi/ecm/en/imc14675usen/ New York, NY, USA. Brooklyn College, City University of New York, Brooklyn, IMC14675USEN.PDF. NY, USA. 26. IBM: Harvard Medical School; 2011. http://public.dhe.ibm.com/common/ssi/ ecm/en/imc14685usen/IMC14685USEN.PDF. Received: 27 August 2013 Accepted: 5 January 2014 27. Raghupathi W, Kesh S: Interoperable electronic health records Published: 7 February 2014 design: towards a service-oriented architecture. e-Service Journal 2007, 5:39–57. 28. Borkar VR, Carey MJ, Chen L: Big data platforms: what's next? ACM References Crossroads 2012, 19(1):44–49. 1. Raghupathi W: Data Mining in Health Care. In Healthcare Informatics: Improving Efficiency and Productivity. Edited by Kudyba S. Taylor & Francis; 29. Ohlhorst F: Big Data Analytics: Turning Big Data into Big Money. USA: John 2010:211–223. Wiley & Sons; 2012. Raghupathi and Raghupathi Health Information Science and Systems 2014, 2:3 Page 10 of 10 http://www.hissjournal.com/content/2/1/3 30. Zikopoulos PC, DeRoos D, Parasuraman K, Deutsch T, Corrigan D, Giles J: Harness the Power of Big Data. McGraw-Hill: The IBM Big Data Platform; 31. Zikopoulos PC, Eaton C, DeRoos D, Deutsch T, Lapis G: Understanding Big Data – Analytics for Enterprise Class Hadoop and Streaming Data. McGraw-Hill: Aspen Institute; 2012. 32. Bollier D: The Promise and Peril of Big Data. Washington, DC: The Aspen Institute; 2010. doi:10.1186/2047-2501-2-3 Cite this article as: Raghupathi and Raghupathi: Big data analytics in healthcare: promise and potential. Health Information Science and Systems 2014 2:3. Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit

Journal

Health Information Science and SystemsPubmed Central

Published: Feb 7, 2014

There are no references for this article.