Access the full text.
Sign up today, get DeepDyve free for 14 days.
MOHAMMAD AMIN MORID Department of Information Systems and Analytics, Leavey School of Business, Santa Clara University, Santa Clara, CA, USA, mmorid@scu.edu OLIVIA R. LIU SHENG Department of Operations and Information Systems, David Eccles School of Business, University of Utah, Salt Lake City, UT, USA, olivia.sheng@eccles.utah.edu JOSEPH DUNBAR Department of Operations and Information Systems, David Eccles School of Business, University of Utah, Salt Lake City, UT, USA, joseph.dunbar@eccles.utah.edu Traditional Machine Learning (ML) methods face unique challenges when applied to healthcare predictive analytics. The high -dimensional nature of healthcare data necessitates labor-intensive and time-consuming processes when selecting an appropriate set of features for each new task. Furthermore, ML methods depend heavily on feature engineering to capture the sequential nature of patient data, oft entimes failing to adequately leverage the temporal patterns of medical events and their dependencies. In contrast, recent Deep Learning (DL) methods have shown promising performance for various healthcare prediction tasks by specifically addressing the high -dimensional and temporal challenges of medical data. DL techniques excel at learning useful representations of medical concepts and patient clinical data as well as their nonlinear interactions from high-dimensional raw or minimally-processed healthcare data. In this paper we systematically reviewed research works that focused on advancing deep neural networks to leverage patient structured time series data for healthcare prediction tasks. To identify relevant studies, we searched MEDLINE, IEEE, Scopus, and ACM digital library for th relevant publications through November 4 , 2021. Overall, we found that researchers have contributed to deep time series prediction literature in ten identifiable research streams: DL models, missing value handling, addressing temporal irregularity, patient representation, static data inclusion, attention mechanisms, interpretation, incorporation of medical ontologies, learning strategies, and scalability. This study summarizes research insights from these literature streams, identifies several critical research gaps, and suggests future research opportunities for DL applications using patient time series data. CCS Concepts: Applied computing → Life and medical sciences → Health informatics; Computing methodologies → Machine learning. Additional keywords and phrases: Systematic review; Patient time series; Deep learning methods; Healthcare predictive analytics. 1 INTRODUCTION As the digital healthcare ecosystem expands, healthcare data is increasingly being recorded within Electronic Health Records (EHR) and Administrative Claims (AC) systems [1,2]. The widespread adoption of these information systems has become popular with government agencies, hospitals, and insurance companies [3,4], capturing data from millions of individuals over many years [5,6]. As a result, physicians, and other medical practitioners are increasingly overwhelmed by the massive amounts of recorded patient data, especially given these professionals’ relatively limited access to time, tools, and experience wielding this data on a daily basis [7,8]. This problem has caused machine learning (ML) methods to gain attention within the medical domain, since ML methods effectively use an abundance of available data to extract actionable knowledge, thereby both predicting medical outcomes and enhancing medical decision making [3,9]. Specifically, ML has been utilized in the assessment of early triage, the prediction of physiologic decompensation, the identification of high cost patients, and the characterization of complex, multi-system diseases [10,11], to name a few. Some of these problems, such as early triage Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that co pies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on s ervers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Association for Computing Machinery. 2158-656X/2022/1-ART1 $15.00 http://dx.doi.org/10.1145/3531326 ACM Trans. Manage. Inf. Syst. assessment, are not new and date back to at least World War I, but the success of ML methods and the concomitant, growing deployment of EHR and AC information systems have sparked broad research interest [4,12]. Despite the swift success of traditional ML in the medical domain, developing effective predictive models remains difficult. Due to the high-dimensional nature of healthcare data, typically only a limited set of appropriate features from among thousands of candidates are selected for each new prediction task, necessitating a labor-intensive and time-consuming process. This often requires the involvement of medical experts to extract, preprocess, and clean data from different sources [13,14]. For example, a recent systematic literature review found that risk prediction models built from EHR data use a median of 27 features from among many thousands of potential variables [15]. Moreover, in order to handle the irregularity and incompleteness prevalent in patient data, traditional ML models are trained using coarse-grain aggregation measures, such as mean and standard deviation, for input features. These depend heavily on manually-crafted features, and they cannot adequately leverage the temporal sequential nature of medical events and their dependencies [16,17]. Another crucial observation is that patient data evolves over time. The sequential nature of medical events, their associated long-term dependencies, and confounding interactions (e.g., disease progression and intervention) offer useful but highly complex information for predicting future medical events [18,19]. Aside from limiting the scalability of traditional predictive models, these complicating factors unavoidably result in imprecise predictions, which can often overwhelm practitioners with false alarms [20,21]. Effective modeling of high-dimensional, temporal medical data can help to improve predictive accuracy and thus increase the adoption of state-of-the-art models in clinical settings [22,23]. Compared with the traditional ML counterpart, deep learning (DL) methods have shown superior performance for various healthcare prediction tasks by addressing the aforementioned high-dimensionality and temporality of medical data [12,16]. These enhanced neural network techniques can learn useful representations of key factors, such as esoteric medical concepts and their interactions, from high-dimensional raw or minimally-processed healthcare data [5,20]. DL models achieve this through repeated sequences of training layers, each employing a large number of simple linear and nonlinear transformations that map inputs to meaningful representations of distinguishable temporal patterns [5,24]. Released from the reliance on experts to specify which manually-crafted features to use, these end-to-end neural net learners have the capability to model data with rich temporal patterns and can encode high-level representations of features as nonlinear combinations of network parameters [25,26]. Not surprisingly, the recent popularity of DL methods has correspondingly increased the number of their associated publications in the healthcare domain [27]. Several studies have reviewed such works from different perspectives. Pandey and Janghel (2019) [28] and Xiao et al. (2018)[29] describe a wide variety of DL models and highlight the challenges of applying them to a healthcare context. Yazhini and Loganathan (2019)[30], Srivastava et al. (2017)[31] and Shamshirband et al. (2021)[32] summarize various applications in which DL models have been successful. Unlike the aforementioned studies, which broadly review DL in various health applications, ranging from genomic analysis to medical imaging, Shickel et al. (2018)[27] exclusively focus on research involving EHR data. They categorize deep EHR learning applications into five categories: information extraction, representation learning, outcome prediction, computational phenotyping, and clinical data de- identification, while describing a theme for each category. Finally, Si et al. (2021)[33] focus on EHR representation learning and investigate their surveyed studies in terms of publication characteristics, which include input data and preprocessing, patient representation, learning approach, and evaluative outcome attributes. In this paper, we review studies focusing on DL prediction models that leverage patient structured time series data for healthcare prediction tasks from a technical perspective. We do not focus on unstructured patient data, such as images or clinical notes, since DL methods that include natural language processing and unsupervised learning tend to ask research questions that are quite different due to the unstructured nature of the data types. Rather, we summarize the findings of DL researchers for leveraging structured healthcare time series data, of numeric and categorical types, for a target prediction task in terms of the network architecture and learning strategy. Furthermore, we methodically organize how previous researchers have handled the challenging characteristics of healthcare time series data. These characteristics notably include incompleteness, multimodality, irregularity, visit representation, the incorporation of attention mechanisms or medical domain knowledge, outcome interpretation, and scalability. To the best of our knowledge, this is the first review study to investigate these technical characteristics of deep time series prediction in healthcare literature. ACM Trans. Manage. Inf. Syst. 2 METHOD 2.1 Overview The primary goal of this systematic literature review is to extract and organize the findings from research on structured time series prediction in healthcare using DL approaches, and to subsequently identify related, future research opportunities. Because of their fundamental importance and potential impact, we aimed to address the following review questions: 1. How are various healthcare data types represented as input for DL methods? 2. How do DL methods handle the challenging characteristics of healthcare time series data, including incompleteness, multimodality, and irregularity? 3. What DL models are most effective? In what scenarios does one model have advantages over another? 4. How can established medical resources help DL methods? 5. How can the internal processes of DL outcomes be interpreted to extract credible medical facts? 6. To what extent do DL methods developed in limited healthcare settings become scalable to larger healthcare data sources? In order to answer these questions, we identify ten core characteristics including medical task, database, input features, preprocessing, patient representation, DL architecture, output temporality, performance, benchmarks, and interpretation for extraction from each study. Section 2.4 elaborates on these ten core characteristics. Additionally, we find that asserted research contributions of the deep time series prediction literature can be classified into the following ten categories: patient representation, missing value handling, DL model, addressing temporal irregularity, attention mechanism, incorporation of medical ontologies, static data inclusion, learning strategy, interpretation, and scalability. Section 3 introduces selected papers exhibiting research contributions in each of these ten categories and further describes their research approaches, hypotheses, and evaluation results. In Section 4, we discuss strengths and weaknesses of the main approaches and identify research gaps based on the same identified ten categories. 2.2 Literature Search th We searched for eligible articles in MEDLINE, IEEE, Scopus, and the ACM digital library published before February 7 , 2021. In order to show a complete picture of the studies published in 2020, we performed the search and selection process again on th November 4 , 2021, and added all studies published in 2020. Our specific search queries for each of these databases can be found in Table S1 of the online supplement. 2.3 Inclusion and Exclusion Criteria We followed PRISMA guidelines [34] to include English-language, original research studies published in peer-reviewed journals and conference proceedings. Posters and preprints were not included. We specifically selected papers that employed DL methods to leverage structured patient time series data for healthcare prediction tasks. Reviewed works can be broadly classified under the outcome prediction category of Shickel et al. (2018)[27]. We excluded studies based on unstructured data, as well as those lacking key information on the core study characteristics listed in Table 1. 2.4 Data Extraction We focused the review of each study to center on the ten identifiable features relating to its problem description, input, methodology, and output. Table 1 provides brief description and explains the main usage of each of these ten characteristics. ACM Trans. Manage. Inf. Syst. Table 1: Core study characteristics Description Usage Feature Helps to understand if a certain network quality Describes the medical time series Medical Task fits a specific task or if it is generalizable to more prediction goal. than one task. Helps to understand whether the experimental dataset is public or not. Also, since patient data in Determines the healthcare data source and Database different countries is recorded with different scope used for the experiments. coding systems, this aids in identifying the adopted coding system. Determines if patient demographic data is Demographic used as input. Determines if patient vital signs are used Vital Sign as input. Determines if patient lab tests are used as Lab Test input. Helps to understand the variety of structured Determines if patient procedure codes are Input Procedure Codes patient data used as input in the study. used as input. Determines if patient diagnosis codes are Diagnosis Codes used as input. Determines if patient medication codes are Medication Codes used as input. Others Describes other EHR or AC input features. Helps to understand how data preprocessing Describes the windowing and missing Preprocessing affects the outcome. value imputation methods. Helps to identify whether sequence representation Shows the final format of the time series Patient Representation or matrix representation has been used to represent data fed into the DL model. patient time series data. Helps to compare and contrast the learning Shows the DL model architecture used for DL Architecture architectures, and also to identify architectural the time series prediction. contributions. Specifies whether the output is the same for a Determines whether the target is static or Output Temporality sequence of events or if it changes over time for dynamic. each event. Provides researchers in each learning task with Shows the highest achieved performance Performance state-of-the-art prediction performance. based on the primary outcome. Identifies traditional ML or DL models that are Lists the models used as a baseline for Benchmark outperformed by the proposed model. comparison. Aids in understanding how a DL black -box model Shows the methods used for DL model Interpretation has been interpreted. interpretation. ACM Trans. Manage. Inf. Syst. 2.5 Data analysis Each selected study in the systematic review has either proposed a technical contribution to advancing the methods for a deep time series prediction pipeline or adopted an extant method for a new healthcare application to make a domain contribution. The focus of this systematic review is summarizing the findings of the former research stream with technical contributions and identifying the associated research gaps. Nevertheless, we also briefly summarize and discuss articles with domain contributions. Based on the technical contributions noted in the included studies, we classify identifiable contributions into one of ten categories: patient representation, missing value handling, DL models, addressing temporal irregularity, attention mechanisms, incorporation of medical ontologies, static data inclusion, learning strategy, interpretation, and scalability. For each given category, Section 3 summarizes deep patient time series learning approaches identified in the reviewed studies. Section 4 compares the strengths and weaknesses of these DL techniques and the associated future research opportunities. 3 RESULTS Our literature search initially resulted in 1,524 studies, with 511 of them being duplicates (i.e., indexed in multiple databases). The remaining 1,014 works underwent a title and abstract screening. Following our exclusion criteria, 621 studies were excluded. Out of these 621 omitted studies, 74 did not use EHR or AC data, 81 did not use multivariate temporal data, 171 did not use DL methods for their prediction tasks, and 295 studies were based on unstructured data, such as images, clinical note s, or sensor data. The remaining 393 papers were then selected for a full-text review, and we subsequently removed 316 additional papers because they lacked one or more of the core study characteristics listed in Table 1. Specifically, 64 of the removed papers did not provide distinctive input features (e.g., medical code types), 99 did not have patient representation (e.g., embedding vector creation), 129 did not sufficiently describe their DL network architectures (e.g., RNN network type), and 24 did not specify their output temporality (i.e., static or dynamic) designs. Below, Figure 1 summarizes the article extraction procedure, and Figure 2 shows the distribution of the 77 included studies based on their publication year. A majority of the studies (77%) were published after 2018, signaling a recent surge in interest among researchers for DL models applied to healthcare prediction tasks. Figure 1: Inclusion flow of the systematic review. ACM Trans. Manage. Inf. Syst. Figure 2: Number of publications per year. Table 2 lists the included studies by prediction task. Note that mortality, heart failure, readmission, and patient next-visit diagnosis predictions are the most studied prediction tasks, and a publicly available online dataset, the Medical Information Mart for Intensive Care (MIMIC) [35], is the most popular data source for the studies. A complete list of the included studies and their characteristics as delineated in Table 1 is available in the online supplement (Tables S2 and S3). Table 2: List of reviewed studies per prediction task. Prediction Task Reference Che et al. (2018)[36], Sun et al. (2019)[9], Yu et al. (2020)[37], Ge et al. (2018)[38], Caicedo-Torres et al. (2019)[39], Sha et al. (2017)[2], Harutyunyan et al. (2019)[12], Rajkomar et a. (2018)[13], Zhang et Mortality al. (2020)[40], Shickel et al. (2019)[41], Purushotham et al. (2018)[42], Gupta et al. (2020)[43], Baker et al. (2020)[44], Yu et al. (2020)[45] Cheng et al. (2016)[18], Choi et al. (2017)[46], Yin et al. (2019)[47], Wang et al. (2018)[48], Ju et al. Heart Failure (2020)[49], Bekhet et al. (2019)[50], Choi et al. (2016)[22], Maragatham et al. (2019)[51], Zhang et al. (2019)[52], Choi et al. (2017)[53], Ma et al. (2018)[54], Choi et al. (2018)[55], Solares et al. (2020)[56] Zhang et al. (2018)[57], Wang et al. (2018)[58], Lin et al. (2019)[59], Barbieri et al. (2020)[60], Min et Readmission al. (2019)[1], Ashfaq et al. (2019)[61], Rajkomar et a. (2018)[13], Reddy et al. (2018)[62], Nguyen et al. (2017)[63], Zhang et al. (2020)[40], Solares et al. (2020)[56] Lipton et al. (2015)[64], Choi et al. (2016)[7], Pham et al. (2016)[65], Wang et al. (2019)[20], Yang et al. (2019)[66], Guo et al. (2019)[67], Wang et al. (2019)[68], Ma et al. (2017)[69], Ma et al. (2018)[70], Harutyunyan et al. (2019)[12], Pham et al. (2017)[71], Lee et al. (2019)[72], Rajkomar et a. Next Visit Diagnosis (2018)[13], Lee et al. (2020) [73], Choi et al. (2017)[53], Purushotham et al. (2018)[42], Gupta et al. (2020)[43], Lipton et al. (2016)[74], Bai et al. (2018)[75], Liu et al. (2020)[76], Zhang et al. (2020)[77], Qiao et al. (2020)[78] Che et al. (2018)[36], Park et al. (2019)[79], An et al. (2019)[80], Duan et al. (2019)[81], Park et al. Cardiovascular Disease (2018)[82] Che et al. (2018)[36], Harutyunyan et al. (2019)[12], Rajkomar et al. (2018)[13], Zhang et al. Length of Stay (2020)[40], Purushotham et al. (2018)[42] ACM Trans. Manage. Inf. Syst. Zhang et al. (2017)[83], Zhang et al. (2019)[84], Wickramaratne et al. (2020)[85], Svenson et al. Sepsis Shock (2020)[86], Fagerström et al. (2019)[87] Hypertension Mohammadi et al. (2019)[88], Ye et al. (2020) [89] Decompensation Harutyunyan et al. (2019)[12], Purushotham et al. (2018)[42], Thorsen-Meyer et al. (2020)[90] Illness Severity Chen et al. (2018)[17], Zheng et al. (2017)[91], Sue et al. (2017)[92] Acute Kidney Injury Tomasev et al. (2019)[93] Joint Replacement Surgery Risk Qiu et al. (2019)[94] Post-Stroke Pneumonia Ge et al. (2019)[95] Renal Disease Razavian et al. (2016)[96] Adverse Drug Event Rebane et al. (2019)[97] Cost Morid et al. (2020)[98] Chronic Obstructive Pulmonary Cheng et al. (2016)[18] Disorder Kidney Transplantation Endpoint Esteban et al. (2016)[3] Surgery Recovery Che et al. (2018)[36] Diabetes Ju et al. (2020)[49] Asthma Xiang et al. (2020)[99] Neonatal Encephalopathy Gao et al. (2019)[100] After reviewing the included studies, we found that the asserted contributions of researchers within the deep time series prediction literature can be distinguished and classified under the following ten categories: (1) patient representation, (2) missing value handling, (3) DL model, (4) addressing temporal irregularities, (5) attention mechanism, (6) incorporation of medical ontologies, (7) static data inclusion, (8) learning strategy, (9) interpretation strategies, and (10) scalability. The rest of Section 3 devotes one subsection for each of these categories to describe the associated findings by category. Figure 3 below gives a general overview of the focal approaches adopted by the included studies. ACM Trans. Manage. Inf. Syst. Figure 3: Summary of deep patient time series prediction designs ACM Trans. Manage. Inf. Syst. 3.1 Patient Representation Patient representations employed for deep time series prediction in healthcare can broadly be classified into one of two categories: sequence representation and matrix representation [1]. In the former approach, each patient is represented as a sequence of medical event codes (e.g., diagnosis code, procedure code, or medication code), and the additional input may or may not include the time interval between the events (Section 3.3). Since a complete list of medical codes is generally quite long, various embedding techniques are commonly used to shorten it or combine similar medical codes with comparable values. In the latter approach, each patient is represented as a longitudinal matrix, where columns correspond to different medical events, and rows correspond to regular time intervals. As a result, a cell in a patient matrix provides the code of the patient’s medical or claims event at a particular time point. Zhang et al. (2018)[57] followed a hybrid approach which splits the overall patient sequence of visits into multiple subsequences of equal length, and then embeds the medical codes in each subsequence as a multi-hot vector. As seen in Table S3, sequence representation is a slightly more prevalent approach employed by researchers (57%). Generally, for prediction tasks with numeric inputs, such as lab tests or vital signs, sequence representation is more common ly used, and for those with categorical inputs, like diagnosis codes or procedure codes, matrix representation is the trend. Nevertheless, there are some exceptions. Rajkomar et al. (2018)[13] converted patient lab test results from numeric values to categories by assigning a unique token to each lab test name, value, and unit (e.g., H emoglobin 12 g/dL) for predicting mortality, length-of-stay, and readmission in Intensive Care Units (ICUs). Ashfaq et al. (2019)[61] included the lab test code with a value if the value was designated to be abnormal (determined according to medical domain knowledge), in addition to the typical inclusion of diagnosis and procedure codes. Several research groups [72,80,89] converted numerical lab test results into predesigned categories by encoding them as either missing, low, normal, or high when predicting hypertension and the associated onset of high-risk cardiovascular states. Similarly, Barbieri et al. (2020)[60] transformed vital signs into OASIS severity scores, and then discretized these scores into categories of low, normal, and high. Of note, a singular study observed the superiority of matrix representation over sequence representation for readmission prediction of Chronic Obstructive Pulmonary Disease (COPD) patients using a large AC database [1]. This study and other matrix representations [44,57,96] found that integrating coarse time granularities such as weekly or monthly, rather than finer time granularity measures can improve performance. This study also compared various embedding techniques, and the authors found no significant differences in their results. Finally, Qiao et al. (2020)[78] summarized each numerical time series in terms of temporal measures such as their self-correlation structure, data distribution, entropy, and stationarity. They found that these measures can improve the interpretability of the extracted temporal features without degrading prediction performance. For embedding medical events in the sequence representation, a commonly observed technique was to augment the neural network with an embedding layer that can learn effective medical code representations. This technique has benefitted the prediction of hospital readmission [58], patient next-visit diagnosis [66], and the onset of vascular diseases [82]. Another event embedding technique has been to use a pre-trained embedding layer via probabilistic methods, especially word2vec [101] and Skip-Gram [102], which have shown promising results for predicting an assortment of healthcare outcomes, such as patient next-visit diagnosis [7], heart failure [46,51], and hospital readmission [57]. Choi et al. (2016)[7] demonstrated that pre-trained embedding layers can outperform trainable layers by a 2% margin in recall for the next-visit diagnosis prediction problem. Instead of relying on individual medical codes for the next-visit diagnosis problem, several studies grouped medical codes using the first three digits of each diagnosis code, and other works implemented Clinical Classification Software (CCS) [103] to obtain groupings of medical codes [68,73]. On the other hand, Maragatham et al. (2019)[51] observed that pre-trained embedding layers can outperform medical group coding methods by a 1.5% margin in Area Under the Curve (AUC) for heart failure prediction. Finally, Min et al. (2019)[1] showed that, independent of the embedding approach, patient matrix representation generally outperformed sequence representation. 3.2 Missing value handling Missing value imputation using methods such as zero [3,40], median [58], forward-backward [64,66] and domain-knowledge by experts [12,38] has been the most common approach for handling missing values in patient time-series data. Lipton et al. (2016)[74] was the first study that used a masking vector to utilize the availability of values as a separate input to predict ACM Trans. Manage. Inf. Syst. discharge diagnosis. Other studies adopted the same approach for predicting readmission [59], acute kidney injury [93], ICU mortality [37], and length-of-stay [12]. Lastly, Che et al. (2018) [36] utilized missing patterns as input for predicting mortality, length-of-stay, surgery recovery, and cardiac condition. Their approach outperformed the masking vector technique by approximately 2% margin in AUC. 3.3 Deep learning model Table 3 shows the summary of model architectures adopted to learn a deep patient time series prediction model for each included study. Recurrent Neural Networks (RNN) and their modern variants, including Long Short-Term Memory (LSTM) and Gated Recurrent Units (GRU), were by far the most frequently used models (84%). A few studies compared the GRU variant against the LSTM architecture. Overall, GRU achieved around 1% advantage in AUC metrics over LSTM for predicting heart failure [47], kidney transplantation endpoint [3], mortality in the ICU [36], and readmission prediction of chronic disease patients [1]. However, for predicting the diagnosis code group of a patient’s next admission to the ICU [68], sepsis shock [83], and hypertension [89], researchers did not find significant differences between these two advanced RNN model types. Additionally, bidirectional variants of GRU and LSTM so-called Bi-GRU and Bi-LSTM consistently outperformed their unidirectional counterparts for predicting hospital readmission [57], diagnosis at hospital discharge [66], patient next-visit diagnosis [67,69,75], adverse cardiac events [81], readmission after ICU discharge [59,60], mortality in hospital [2,45], length- of-stay in hospital [12], sepsis [85], and heart failure [54]. While the majority of studies (63%) employed single-layered RNN, many other works used multi-layered RNN models with GRU [7,48], LSTM [40,64,68,74], and Bi-GRU [2,67,82]. However, despite the numerous studies employing these methods and their variants, multi-layered GRU is the only architecture that has been experimentally compared to its single-layered counterpart for the patient next-visit diagnosis [7] and heart failure prediction tasks [48]. Alternatively, researchers have extensively explored training separate network layers with the architectures of LSTM [12,38], Bi-LSTM [77] and GRU layers [17] for each feature. These channel-like architectures per feature were reported as being more successful than the simpler RNN models. Finally, for tasks such as predicting in-hospital mortality or hospital discharge diagnosis code, some RNN models were supervised to make assessments at each time step [12,64,74], a procedure known as target replication. Their successes provided evidence that it can be more effective to repeatedly make a prediction at multiple time points than merely performing supervised learning for the last time-stamped entry. ACM Trans. Manage. Inf. Syst. Table 3: Deep learning model architectures of the reviewed studies Model Reference CNN Caicedo-Torres et al. (2019)[39], Nguyen et al. (2017)[63] Multi-Frame CNN Cheng et al. (2016)[18], Ju et al. (2020)[49] CNN+CNN Razavian et al. (2016)[96], Wang et al. (2018)[58], Morid et al. (2020)[98] Pham et al. (2016)[65] [71], Zhang et al. (2017)[83], Rajkomar et a. (2018)[13], Wang et al. (2019)[20], Gao et al. (2019)[100], Qiu et al. (2019)[94], Mohammadi et al. (2019)[88], Park et al. LSTM (2019)[79], Ashfaq et al. (2019)[61], Maragatham et al. (2019)[51], Yu et al. (2020)[37], Ye et al. (2020) [89], Maragatham et al. (2019)[51], Reddy et al. (2018)[62], Lee et al. (2019)[72], Zhang et al. (2019)[84], Xiang et al. (2020)[99], Thorsen-Meyer et al. (2020)[90] Yang et al. (2019)[66], Ye et al. (2020) [89], Bai et al. (2018)[75], Duan et al. (2019)[81], Yu et al. Bi-LSTM (2020)[45] Lipton et al. (2015)[64], Lipton et al. (2016)[74], Yin et al. (2019)[47], Wang et al. (2019)[68], LSTM+LSTM Zhang et al. (2020)[40], Fagerström et al. (2019)[87] Esteban et al. (2016)[3], Choi et al. (2017)[46], Zheng et al. (2017)[91], Choi et al. (2017)[53], Choi et al. (2016)[22], Che et al. (2018)[36], Ma et al. (2018)[70], Purushotham et al. (2018)[42], GRU Tomasev et al. (2019)[93], Bekhet et al. (2019)[50], Min et al. (2019)[1], Shickel et al. (2019)[41], Solares et al. (2020)[56], Choi et al. (2018)[55], Rebane et al. (2019)[97], Ge et al. (2019)[95], Sue et al. (2017)[92], Liu et al. (2020)[76], Zhang et al. (2020)[77] Ma et al. (2017)[69], Wickramaratne et al. (2020)[85], Zhang et al. (2018)[57], Barbieri et al. Bi-GRU (2020)[60], Sun et al. (2019)[9], Qiao et al. (2020)[78] GRU + GRU Choi et al. (2016)[7], Wang et al. (2018)[48], Gupta et al. (2020)[43] Bi-GRU + Bi-GRU Sha et al. (2017)[2], Park et al. (2018)[82], Guo et al. (2019)[67] (concurrent) GCNN + LSTM Lee et al. (2020) [73] Bi-GRU + CNN Ma et al. (2018)[54] Bi-LSTM + CNN Lin et al. (2019)[59], Baker et al. (2020)[44] One RNN per Feature or Ge et al. (2018)[38], Harutyunyan et al. (2019)[12], An et al. (2019)[80], Chen et al. (2018)[17], Feature Type Svenson et al. (2020)[86] Several studies, particularly those from when deep time series prediction within the healthcare domain was in its nascency, utilized convolutional neural network (CNN) models for prediction tasks without benchmarking against other types of DL models [18,39,58]. These early CNN models have been consistently outperformed by recently-developed RNN models for predicting heart failure [49,52], readmission of patients diagnosed with chronic disease [1], in-hospital mortality [40], diabetes [49], readmission after ICU discharge [40,59], and joint replacement surgery risk [94]. Nevertheless, Cheng et al. (2016)[18] showed that temporal slow fusion can enhance CNN performance, and Ju et al. (2020)[49] suggested using 3D-CNN and spatial ACM Trans. Manage. Inf. Syst. pyramid pooling for outperforming RNN models for heart failure and diabetes prediction tasks. Alternatively, hybrid deployments of CNN/RNN models have been successful in outperforming pure CNN or RNN models for predicting readmission after ICU discharge [59], patient next-visit diagnosis [73], mortality [44], and heart failure [54]. 3.4 Addressing Temporal Irregularity Two types of temporal irregularities, visit and feature, generally exist in patient data. Visit irregularity indicates that t he time interval between visits can vary for the same patient over time. Feature irregularity occurs when different features belonging to the same patient for the same visit are recorded at various time points and frequencies. Che et al. (2016)[7] was the first study to make use of the time interval between patient visits as a separate input to a DL model for the patient next-visit diagnosis prediction task. This approach also proved to be efficacious in predicting heart failure [46], vascular diseases [82], hospital mortality [13] and hospital readmission [13]. Yin et al. (2019)[47] used a sinusoidal transformation of time interval for assessing heart failure. Additionally, Pham et al. (2016)[65] and Wang et al. (2019)[20] modified the internal mechanisms of the LSTM architecture to handle visit irregularity by giving higher weights to recent visits. Their proposed modifications outperformed traditional LSTM architectures by 3% in AUC for the highly frequent benchmarking task of predicting the diagnosis code group of a patient’s next visit. Certain studies hypothesized that handling feature irregularity is more effective than handling visit irregularity [60,91]. Zheng et al. (2017)[91] also modified GRU memory cell learning processes to extract different decay patterns for each input feature for predicting the Alzheimer’s severity score in half-a-year. Their results demonstrated that capturing feature and visit irregularity decreases the mean squared error (MSE) by up to 5% compared to models that capture visit irregularity only. Barbieri et al. (2020)[60] and Liu et al. (2020)[76] used a similar approach when predicting readmission to ICU and for generating relevant medications from billing codes. 3.5 Attention Mechanism Attention mechanisms, originally inspired by the visual attention system found in human physiology, have recently become quite popular among many domains, including deep time series prediction for healthcare [57]. The core underlying idea is that patient visits and their associated medical events should not carry an identical weight during the inference process. Rather, they are contingent on their relative importance for the prediction task at-hand. Most commonly, attention mechanisms initially assign a unique weight for each visit or each medical event, and subsequently optimize these weight parameters during network backpropagation [2,13,22,37]. Also called location-based attention [69], this strategy has been incorporated into a variety of RNN networks and learning tasks, such as GRU for heart failure [22], Bi-GRU for mortality [51], as well as LSTM for hospital readmission, diagnosis, length-of-stay [13], and asthma exacerbation [99]. Other commonly used attention mechanisms include a concatenation-based attention device which has been employed for hospital readmission [60] as well as next-visit diagnosis prediction [69], and general attention models which are used primarily for hospital readmission [57] and mortality prediction [41]. Ma et al. (2017)[69] benchmarked these three attention mechanisms for predicting medical codes by using a large AC database, and Sue et al. (2017)[92] performed a similar benchmarking procedure for illness severity score prediction on EHR data. Both studies reported location-based attention as optimal. With few exceptions, studies employing an attention mechanism tended not to report any differential prediction performance improvements enabled by attention. Those few studies which did distinguish a particular performance improvement reported that location-based attention mechanisms improved patient next-visit diagnosis by 4% in AUC [65], increased hospital readmission F1-Score by 2.4%, and also saw a 13% boost in F1-Score for mortality prediction [2]. Zhang et al. (2018)[57] was the sole work reporting contributions of visit-level attention and medical code attention separately for hospital readmission, observing that each technique provided an approximate 4% increase in F2-Score. An innovative study by Guo et al. (2019)[67] argued that all medical codes should not go through the same weight allocation path during attention calculation. Instead, they proposed a crossover attention model with distinct bidirectional GRUs and attention weights for both diagnosis and medication codes. On the whole, we found that most studies utilized attention mechanisms to improve the interpretability ACM Trans. Manage. Inf. Syst. of their proposed DL models by highlighting important visits or medical codes, at either a patient or population level. Section 3.9 further elaborates on patient and population level properties. 3.6 Incorporation of Medical Ontologies Another facet of these research streams was the incorporation of medical domain knowledge into DL models to enhance their prediction performance. Standard Clinical Classification Software (CCS) has the ability to establish a hierarchy of various medical concepts in the form of successive parent-child relationships. Based on this concept, Choi et al. (2017)[53] employed CCS to create a medical ontology tree for use in a network embedding layer. These encoded medical ontologies were better able to represent abstract medical concepts when predicting heart failure. Zhang et al. (2020)[77] later enhanced this initial ontological strategy by considering more than one parent for each node and also by providing an ordered set of ancestors for each medical concept. Separately, Ma et al. (2018)[70] showed that medical ontology trees can be leveraged when calculating attention weights in GRU models, achieving a 3% accuracy increase over [53] for the same prediction task. Following this, Yin et al. (2019)[47] demonstrated that causal medical knowledge graphs like KnowLife[104], which contain both c ause and i s - caused-by relationships between diseases, outperform both [53] and [70] with an approximate 2% AUC margin for heart failure prediction. Wang et al. (2019)[20], on the other hand, enhanced SkipGram embeddings by adding n-gram tokens from medical concept information, such as disease or drug name, to EHR data. These embedded tokens captured ancestral information for a medical concept similar to ontology trees, and they were applied to the patient next-visit diagnosis task. 3.7 Static Data Inclusion RNN networks are particularly apt at learning from sequential data, though leveraging static data into these types of models has been challenging. The hybrid combination of static along with temporal input is particularly important in a healthcare context, since static features like patient demographic information and prior history can be essential for achieving accurate predictions. Appending patient static data to the input of a final fully-connected layer has been the most common approach for integrating these features. It has been applied to hospital readmission [57,58], length-of-stay [40] and mortality [38,40] tasks. Alternatively, Esteban et al. (2016)[3] fed 342 static features into an entirely independent feedforward neural network before combining the output with temporal data in a typical GRU layer for learning kidney transplant endpoints. Other studies also adopted this approach for predicting mortality [42], phenotyping [42], length-of-stay [42], and the risk of cardiovascular diseases [80]. Moreover, Pham et al. (2016)[65] modified the internal processes of LSTM networks to specifically incorporate the effects of unplanned hospital admissions, which involve higher risks than planned admissions. They employed this approach for predicting patient next-visit diagnosis codes in mental health and diabetes cohorts. Finally, Maragatham et al. (2019)[51] converted static data into a temporal format by repeating it as input to every time point. Together, they used static demographic data, vascular risk factors, and a scored assessment of nursing levels for heart failure prediction. We found no study comparing the aforementioned static data inclusion methods against solid benchmarks. 3.8 Learning strategy We identified three principal learning strategies which differ from the basic supervised learning scenario: (1) cost-sensitive learning, (2) multi-task learning, and (3) transfer learning. When handling imbalanced datasets, cost-sensitive learning has frequently been implemented by modifying the cross entropy loss function [58,61,93,100]. In particular, two studies convincingly demonstrated the performance improvement achieved by cost-sensitive learning. Gao et al. (2019)[100] found a 3.7% AUC increase for neonatal encephalopathy prediction and Ashfaq et al. (2019)[61] observed a 4% increase for the hospital readmission task. The latter study further calculated cost-saving outcomes by estimating the potential annual cost savings if an intervention is selectively offered to patients at high risk for readmission. Instead, multi-task learning was implemented to jointly predict mortality, length-of-stay, and phenotyping with LSTM [13,40], Bi-LSTM [12], and GRU [42] architectures. Harutyunyan et al. (2019)[12] was a seminal study that reported a significant contribution of multi-task learning over state-of- the-art traditional learning, with a solid 2% increase in AUC. Lastly, transfer learning, originally used as a benchmark evaluation by [36], was recently adopted by Gupta et al. (2020)[43] to study both task adaptation and domain adaptation utilizing a non-healthcare model, TimeNet. They found that domain adaptation outperforms task adaptation when the data ACM Trans. Manage. Inf. Syst. size is small, but otherwise task adaptation is superior. Moreover, they found that, for task adaption on medium-sized data, fine-tuning is a better approach than learning from scratch with feature extraction. 3.9 Interpretation By far the most common DL interpretation method is to show visualized examples of selected patient records to highlight which visits and medical codes most influence the prediction task [2,13,22,41,47,49,54,57,60,66,67,69,75,82,95,97]. Specific contributions by feature are extracted from the calculated weight parameters of an attention mechanism (Section 3.6). Visualizations can also be implemented through a global average pooling layer [65,82] or a one-sided convolution layer within the neural network [57]. Another interpretation approach is to report the top medical codes with the highest attention weights for all patients together [2] or for different patient groups by disease [47,57,69,80]. Specifically, Nguyen et al. (2017)[63] extracted the most frequent patterns in medical codes by disease type, and Caicedo-Torres et al. (2019)[39] identified important temporal features for mortality prediction using both DeepLIFT [105] and Shapley [106] values. The technique of using Shapley values for interpretation was also employed for continuous mortality prediction within the ICU setting [90]. Finally, Choi et al. (2017)[46] performed error analysis on false positive and false negative predictions to differentiate the contexts in which their DL models are more or less accurate. 3.10 Scalability While most review studies evaluated their proposed models on a single dataset usually a publicly available resource such as MIMIC and its updates [35] certain studies focused on assessing the scalability of their models to a wider variety of data. Bekhet et al. (2019)[50] evaluated one of the most popular deep time series prediction models with two GRU layers, called RETAIN which was first proposed in study [22], on a collection of ten hospital EHR datasets for heart failure prediction. Overall, they achieved a similar AUC compared to the original study, though a higher dimensionality did further improve prediction performance. Using the same RETAIN model, Solares et al. (2020)[56] conducted a scalability study on approximately four million patients in the UK National Health Service, and they reported an identical observation to [49]. Another large dataset was explored by Rajkomar et al. (2018)[13], who demonstrated the power of LSTM models on a variety of healthcare prediction tasks for 216,000 hospitalizations involving 114,000 unique patients. Finally, we found a singular study [1] investigating the scalability of deep time series prediction methods for AC data, as opposed to EHR sequences. Min et al. (2019) [1] observed that DL models are effective for readmission prediction with patient EHR data, but they tend not to be superior to traditional ML models using AC data. Studies on the MIMIC database have consistently used the same 17 features in the dataset which have a low missing rate [107]. To address dimensional scalability, Purushotham et al. (2018)[42] attempted using as many as 136 features for mortality, length-of-stay, and phenotype prediction with a standard GRU architecture. Compared to an ensemble model constructed from several traditional ML models, they found that, for lower-dimensional data, traditional ML performance is comparable to DL performance, while for high-dimensional data, DL’s advantage is more pronounced. On a similar note, Min et al. (2019)[1] evaluated a GRU architecture against traditional supervised learning methods on around 103 million medical claims and 17 million pharmacy claims for 111,000 patients. Again, they found that strong traditional supervised ML techniques have a comparable performance to that of their DL competitors. 4 DISCUSSION 4.1 Patient Representation Out of the commonly used sequential and matrix patient representations, prediction tasks with predominantly numeric inputs such as lab tests and vital signs, often rely on sequence representations, whereas those studies utilizing mainly categorical inputs, like diagnosis codes or procedure codes, commonly incorporate a matrix representation. Other than a lone study [1] that documented the superiority of the matrix approach on AC data, we found no consistent comparison between these two approaches in our systematic review. Also, while a coarse grain abstraction has not been suggested for each of these approaches, changing the granularity level to find the optimal level would be highly suggested to further ascertain their respective efficacy. The rationale is that the sparsity of temporal patient data is typically high, and considering every ACM Trans. Manage. Inf. Syst. individual visit for an embedded patient representation may not be the optimal approach when factoring in the corresponding increase in computational complexity. In order to combine numeric and categorical input features, researchers have generally employed three distinct methods. One method involves converting patient numeric quantities to categorical ones by assigning a unique token to each measure. Thus, each specific lab test code, value, and unit will have its own identifying marker. Using a second method, researchers encode numeric measures with clinically meaningful names, such as missing, low, high, normal, and abnormal. A third alternative requires the conversion of numeric measures to severity scores, in order to discretize them into low, normal, and high categories. The second approach was quite common in our selected studies, likely due to its implementation simplicity and effectiveness for a wide variety of clinical healthcare applications. We therefore report it to be the most dominant strategy for combining numeric and categorical inputs for deep time series prediction tasks. When embedding medical events into a sequence representation, we again found three prevalent techniques. Using the first technique, researchers commonly added a separate embedding layer, prefacing the bulwark of the recurrent network, in order to optimize medical code representation. Alternatively, pre-trained embedding layers with established methods such as word2vec were adopted in lieu of learning embeddings from scratch. Lastly, researchers often utilized medical code groups instead of the atomized medical code. Among the three practices, pre-trained embedding layers have consistently outperformed naïve embedding layers and medical code groupings for EHR data, while no significant difference in model performance has been observed for AC data. In addition, researchers have shown that temporal matrix representation is the most effective approach for AC data. The rationale is that the temporal granularity of EHR data is usually at the level of an hour or even minute, while the granularity of AC data is at the day level. As a result, the order of medical codes within a day is ordinarily lost for the embedding algorithms such as word2vec. Combining our findings, a sequence representation with a pre- trained embedding layer is highly recommended for learning tasks on EHR data, while a matrix representation seems to be more effective for AC data. Several important gaps exist regarding the specific representation of longitudinal patient data. Sequence and matrix methodologies should be compared in a sufficient variety of healthcare settings for EHR data. If extensive comparisons could confirm the relative performance of matrix representation, then it would further enhance its desirability, as it is easier to implement and has a faster runtime than sequences of EHR codes. Moreover, to improve patient similarity measures, researchers should analyze the effect of different representation approaches under various DL model architectures . Lastly, we found that very few reviewed studies included both numerical and categorical measures as feature input. A superior approach which synergistically combines their relative strengths has not yet been sufficiently studied, and thus requires the attention of future research. Further investigation of novel DL architectures with a variety of possible input measures is therefore recommended. 4.2 Missing Value Handling The most common missing value handling approach found in the deep time series prediction literature was imputation by pre- determined measures, such as zero or the median also a common practice in non-healthcare domains [108]. However, missing values in healthcare data typically do not occur at random as they can reflect specific decisions by caregivers [74]. These missing values thus represent informative missingness, providing rich information about target labels [36]. In order to capture this correspondence, researchers have implemented two primary approaches. The first approach involves creating a binary (masking) vector for each temporal variable, indicating the availability of data at each time point. This approach has been evaluated in various applications, and it seems to be an effective way of handling missing values. Second, missing patterns can be learned by directly training the imputation value as a function of either the latest observation or the empirical mean prior to variable observations. This latter approach is more effective when there is a high missing rate and a high correlation between missing values and the target variable. For instance, Che et al. (2018) [36] found that learning missing values was more effective when the average Pearson correlation between lab tests with a high rate of missingness and the dependent variable, mortality, was above 0.5. Despite this, since masking vectors have been evaluated on a wider variety of healthcare applications, and with different degrees of missingness, they should remain as the suggested missing value handling strategy for deep time series prediction. ACM Trans. Manage. Inf. Syst. Interestingly, there was no study assessing the differential impact of missingness for individual features on a given learning task. The identification of features whose exclusion or missingness most harms the prediction process informs practitioners about how to focus their data collection and imputation strategies. Furthermore, while informative missingness applies to many temporal features, missing-at-random can still be the case for other feature types. As a direction for future study, we recommend a comprehensive analysis of potential sources of missingness, for each feature and its type, along with assistance from domain experts. This would better inform a missing value handling approach within the healthcare domain, and as a consequence, enhance prediction performance accordingly. 4.3 Deep Learning Models Rooted in their ability to efficiently represent sequential data and extract its temporal patterns [64], RNN-based DL models and their variants were found to be the most prevalent architecture for deep time series prediction on healthcare data. Patient data naturally has a sequential nature, where hospital visits or medical events occur chronologically. Lab test orders or vital sign records, for example, take place at specific timestamps during a hospital visit. However, vanilla RNN architectures are not sophisticated enough to sufficiently capture temporal dependencies when EHR sequences are relatively long, due to the vanishing gradient problem [109]. To address this issue, LSTM and GRU recurrent networks, with their memory cells and elaborate gating mechanisms, have been habitually employed by researchers, with improved outcomes on a variety o f healthcare prediction tasks. Although some studies display a slight superiority of GRU architectures versus LSTM networks, (around 1% increase in AUC), other studies did not find significant differences between them. Overall, LSTM and GRU have similar memory-retention mechanisms, though GRU implementations are less complex and have faster runtimes [89]. Due to this similarity, most works have used one without benchmarking it against the other. In addition, for very long EHR sequences, such as ICU admissions with a high rate of recorded medical events, bidirectional GRU and LSTM networks consistently outperformed their unidirectional counterparts. This is likely due to the fact that bidirectional recurrent networks simultaneously learn from both past and future values in a temporal sequence, and so, they retain additional trend information [69]. This is particularly important in the healthcare context, since patient health status patterns change rapidly or gradually over time [12]. For example, an ICU patient with a rapidly fluctuating health status over the past week may eventually die, even if the patient is currently in a good condition. Another patient, initially admitted to the ICU within the past week in a very bad condition, may gradually improve and survive. Therefore, bidirectional recurrent networks are the most state-of-the- art DL models for time series prediction in healthcare. GRU, which has lower complexity and comparable performance to LSTM, is the preferred model variant, though additional comparative studies are recommended by this review to affirm this conclusion. While a majority of the RNN studies employed single-layer architectures, some studies chose an increased complexity with multi-layered GRU [7,48], LSTM [40,64,68,74], and Bi-GRU [2,67,82] networks. Other than two earlier works [7,48], multi- layered architectures were not consistently tested against their single-layered counterparts. Consequently, it is difficult to decipher if adding additional RNN layers, whether they are bidirectional or not, improves learning performance. On the other hand, channel-wise learning, a technique which trains a separate RNN layer per feature or feature type, successfully enhanced traditional RNN models which contain network layers that learn all feature parameters simultaneously. There are two underlying ideas behind this development. First, it helps identify unique patterns within each individual time-series (e.g. body organ system status) [17], prior to integration with patterns found in multivariate data. Second, channel-wise learning facilitates the identification of patterns related to informed missingness, by discovering which of the masked variables correlates strongly with other variables, target or otherwise [12]. Nevertheless, channel-wise needs further benchmarking against vanilla RNN models to learn the conditions under which it is most beneficial. Additionally, certain works enhanced upon the supervised learning process of RNN models. For prediction tasks with a static target, such as in-hospital mortality, RNN models were supervised at multiple time steps instead of merely the final time-point. This so-called target replication has been shown to be quite efficient during backpropagation [64]. Specifically, instead of passing patient target information across many time-steps, the prediction targets are replicated at each time-point within the sequence, thus providing additional local error signals which can be individually optimized. Moreover, target replication can improve model predictions even when the temporal sequence is perturbed by small, yet significant, truncations. ACM Trans. Manage. Inf. Syst. As noted in Section 3.3, convolutional network models were more commonly used in the early stages of deep time series prediction for healthcare. Eventually, they were shown to be consistently outperformed by recurrent models. However, recent architectural trends have been using convolutional layers as a complement to GRU and LSTM [44,54,59,73]. The underlying idea is that RNN layers capture the global structure of the data via modeling interactions between events, while the CNN layers, using their temporal convolution operators [54], capture local structures of the data occurring at various abstraction levels. Therefore, our systematic review suggests using CNNs to enhance RNN prediction performance, instead of employing either in a standalone setting. Another recent trend in the literature is the splitting of entire temporal sequences into subsequences for various time periods—before applying convolutions of different filter size—in order to capture temporal patterns within each time period [49]. For optimal local pattern (motif) detection, slow-fusion CNN which considers both individual patterns of the time-periods as well as their interactions has been shown to be the most effective convolutional approach [18]. Several important research gaps were identified in the models used for deep time series prediction in healthcare. First, there is no systematic comparison among state-of-the-art models in different healthcare settings, such as rare versus common diseases, chronic versus non-chronic maladies, and inpatient versus outpatient visits. These different healthcare settings have identifiable heterogeneous temporal data characteristics. For instance, outpatient EHR data contains large numbers of visits with few medical events recorded during each visit, while inpatient visits contain relatively few visit records but documented long sequences of events for each visit. Therefore, the effectiveness of a given DL architecture will vary over these different clinical settings. Second, it is not clear whether adding multiple layers of RNN or CNN within a given architecture can further improve model performance. The maximum number of layers observed within the reviewed and selected studies was two. Given enough training samples, the addition of more layers may further improve performance by allowing for the learning of increasingly sophisticated temporal patterns. Third, a majority of the reviewed studies (92%) targeted a prediction task on EHR data, while the generalizability of the models to AC data needs more investigation. For example, although many studies reported promising outcomes for EHR-based hospital readmission predictions using GRU models, Min et al. (2019)[1] found that similar DL architectures are ineffective for claims data. Finding novel models which can extract temporal patterns from EHR data—which are simultaneously applicable to claims data—can be an interesting future direction for transfer learning projects. Fourth, while channel-wise learning seems to be a promising new trend, it needs researchers to further investigate the precise temporal patterns detected by this approach. DL methods focused on interpretability would be ideal for such an application. Fifth, many studies compared their DL methods against expert domain knowledge, but a hybrid approach that leverages expert domain knowledge within the embeddings should help improve representation performance. Lastly, the prediction of medications, either by code or group, has been a well-targeted task. However, a more aggressive approach, such as predicting medications along with their appropriate dosage and frequency, would be a more realistic and useful target for clinical decision making in practice. 4.4 Addressing Temporal Irregularity The most common approach for handling visit irregularity is to treat the time interval between adjacent events as an independent variable and concatenating it to the input embedding vectors. While this technique is easy to implement, it does not consider contextual differences between recent and earlier visits. Addressing this limitation, researchers modified the internal memory cells of RNN networks by giving higher weights to recent visits [20,65]. However, a systematic comparison between the two approaches has not been explored. Therefore the time interval approach, which has been shown to be effective in various applications, remains the most efficient, tested strategy to handle visit irregularity. It is noteworthy that tokenizing time intervals is also considered the most effective method of capturing duration in the context of natural language processing [110,111], a field of study which inspires many of the deep time series prediction methods in healthcare. Although most works addressing irregularity focus on visit irregularity, a few studies concentrated on feature irregularity [60,91]. A fundamental concept underpinning the difference between the two is that fine-grained temporal information is more complex and yet important to learn at the feature level than visit level. Specifically, different features expose different temporal patterns, such as when certain features decay faster than others. Paralleling the work on visit irregularity and time intervals, these studies [60,91] modified the internal processes of RNN networks to learn unique decay patterns for each individual input ACM Trans. Manage. Inf. Syst. feature. Again, this research direction is relatively new and boasts few published works, so it is difficult to make a general suggestion for unilaterally handling feature irregularity in deep time series learning tasks. Overall, we can stipulate that adjusting the memory mechanisms of recurrent networks when addressing either visit or feature irregularity needs additional benchmarking experiments in order to make the arguments robust. Currently, it has been evaluated in a single hospital setting for each case. Therefore, optimal synergies among patient types (inpatient vs. outpatient), sequence lengths (long vs. short) and irregularity approaches (time interval vs. modifying RNN memory cells) are not entirely conclusive, but time interval approaches have been most commonly published 4.5 Attention Mechanisms Attention mechanisms have been employed by researchers with the premise that neither patient visits nor medical codes should contribute equally when performing a target prediction task. As such, learning attention weights for visits and codes have been the subject of many deep time series prediction studies. The three most commonly used attention mechanisms are (1) location-based, (2) general attention, and (3) concatenation-based frameworks. The methods differ primarily on how the learned weight parameters are connected to the model’s hidden states [69]. Location-based attention schemes calculate weights from the most current hidden state. Alternatively, general attention calculations are based on a linear combination connecting the current hidden states to the previous hidden states, with weight parameters being the linear coefficients. Most complex is the concatenation-based attention framework, which trains a multilayer perceptron to learn the relationship between parameter weights and hidden states. Location-based attention systems have been the most commonly used attention mechanisms for deep time series prediction in healthcare. We found several research gaps regarding attention. Most studies relied on attention mechanisms to improve the interpretability of their proposed DL model by highlighting important visits or medical codes, without evaluating the differential effect of attention on prediction performance. This is an important issue, as incorporating attention into a model may improve interpretability, but it does not have an established effect on performance in the DL for healthcare time series domain. Furthermore, with only a single exception [57], we did not find studies reporting the separate contributions of visit- level attention and medical code-level attention. Lastly, and again with only a single exception [69], no study compared the performance or interpretability of different attention mechanisms. All of these research gaps should be investigated in a comprehensive manner in future studies, particularly for EHR data, as most prior attention studies have focused on the clinical histories of individual patients. 4.6 Incorporation of Medical Ontologies When incorporating medical domain knowledge into deep time series prediction models, researchers have mainly utilized medical ontology trees and knowledge graphs within embedding layers of recurrent networks. Some of the success of these approaches is due to the enhancement they provide when addressing rare diseases. Being less frequent in the data, a proper representation and pattern extraction for rare diseases is challenging for simple RNN models. Medical domain knowledge graphs provide rare disease information to the model through ancestral node embeddings that contain hierarchical information of the disease. However, this advantage is not as exceptional when sufficient data is available for all patients over a long record history [53,70]. Continuing research is needed to expand the innovative architectures that incorporate medical ontologies for a broad variety of prediction tasks and case studies. 4.7 Static Data Inclusion There are four published approaches for integrating static patient data with their temporal data. By far, the most common approach is to feed a vector of static features as additional input to the final fully connected layer of a DL network. Another strategy trains a separate feedforward neural network on the static features, and then adds the encoded output of this separate network to the final dense layer in the principal neural network for target prediction. Researchers have also injected static data vectors as input to each time point of the recurrent network, effectively treating the patient demographic and historical data as quasi-dynamic. Lastly, similar to those strategies that handle visit and feature irregularities, researchers have modified the internal memory processes of recurrent networks to incorporate specific static features as input. ACM Trans. Manage. Inf. Syst. The most important research gap regarding static data inclusion is that we have found no study evaluating the differential effects of static data on prediction performance. Moreover, comparing these four approaches in a meaningful benchmarking setting, with the expressed goal of finding the most optimal technique, could be an interesting future research direction. Finally, since DL models may not learn the same representation for every subpopulation of patients (e.g., male vs. female, chronic vs. non-chronic, or young vs. old), significant research gaps exist in the post-analysis of static feature performance as input. Such analyses could inform decision makers of crucial insights into model fairness, and would also stimulate future research on predictive models that better balances fairness with accuracy. 4.8 Learning Strategies Recent literature has investigated three new DL strategies: (1) cost-sensitive learning, (2) multi-task learning, and (3) transfer learning. While many reviewed studies used an imbalanced dataset for their experiments, a select few embedded cost information as a learning strategy that incorporated additional cost-sensitive loss. Specifically, each of these studies changed the loss function of the DL model to increasingly penalize for misclassification of the minority class. In the healthcare domain, imbalanced datasets are very common, and patients with diseases are less common than healthy patients. Moreover, most of the prediction tasks on the minority class lead to critical care decisions, such as identifying those patients who are likely to die in the next 48 hours or those who will become diabetic in the relatively near future. Devising cost-sensitive learning components into DL networks thus needs further attention and is a wide-open research gap for future inquiry. As an example, exploring cost-sensitive methods in tandem with the traditional ML techniques of oversampling or underdamping could lead to significant performance increases in model prediction rates for the minority class. In addition, calculating the precise cost savings when correctly identifying the minority class of patients, similar to [61], can further underline the importance of the cost-sensitive learning strategy. Researchers have reported the benefit of multi-task learning by documenting its performance in a significant variety of healthcare outcome prediction tasks. However, the cited works do not distinguish the model components that exemplify why learning a single, multi-task deep model is preferable to simultaneously training multiple DL models for respective individualized prediction tasks. More specifically, we ask which layers, components, or learned temporal patterns in a DL network should be shared among different tasks, and in which healthcare applications might this strategy be most efficient? These research questions are straightforward and could be fruitfully studied in the near future with explainable DL models. Among the three noted, transfer learning was the least studied learning strategy found within our systematic review of the literature with just a single citation [43], displaying the effectiveness of the method for both task and domain adaptation. It is commonly assumed that, with sufficient data, trained DL models can be effective for a wider variety of prediction tasks and domains. However, in many healthcare settings, such as those with rural patients, sufficient data is difficult to collect [112]. Transfer learning methods have the potential to make a huge impact on deep time series prediction in healthcare by making pre-trained models applicable to essentially any healthcare setting. Still, further research is recommended to ascertain which pathological prediction tasks are most transferable, which network architectures are most flexible, and which model parameters require the least tuning when transferring to different domains. 4.9 Interpretations One of the most common critiques of DL models is the difficulty of their interpretation, and researchers have attempted to alleviate this issue with five different approaches. The first approach uses feature importance measures such as Shapley and DeepLIFT. A Shapley value of a feature is the average of its contribution across all possible coalitions with other features, while DeepLIFT compares the activation of each neuron in the deep model inputs to its default reference activation value and assigns contribution scores according to the difference [113]. Although both of these measures cannot illuminate the internal procedure of DL models, they can identify which features have been most frequently used to make final predictions. A second approach visualizes what input data the model focused on for each individual patient [13] through the implementation of interpretable attention mechanisms. Particularly, some studies investigated which medical visits and features contributed most to prediction performance with a network attention layer. As a clinical decision support tool, this raises clinician awareness of which medical visits deserve careful human examination. In addition to individual patient visualization, a third interpretation ACM Trans. Manage. Inf. Syst. tactic aggregated model attention weights to calculate the most important medical features for specific diseases or patient groups. Additionally, error analysis of final prediction results allowed for consideration of the medical conditions or patient groups for which a DL model might be more accurate. This fourth interpretation approach is also popular in non-healthcare domains [114]. Finally, considering each set of medical events as a basket of items and each target disease as the label, researchers extracted frequent patterns of medical events most predictive of the target disease. Overall, this review found explainable attention to be the most commonly used strategy for interpreting deep time series prediction models evaluated on healthcare applications. Indeed, individual patient exploration can help make DL models more trustworthy to clinicians and facilitate subsequent clinical actions. Nevertheless, because implementing feature importance measures is much less complex, this study recommends consistently reporting them on most healthcare deep times series prediction studies, providing useful clinical implication with little added effort. Although individual-level interpretation is important, extracting general patterns and medical events associated with target healthcare outcomes is also beneficial for clinical decision makers, thereby contributing to clinical practice guidelines. We found just one study implementing a population-level interpretation [63], extracting frequent CNN motifs of medical codes associated with different diseases. Otherwise, researchers broadly have reported the top medical codes with the highest attention weights for all patients [2] or different patient groups, in order to provide a population-level interpretation. This current limitation can be an essential direction for future research involving network interpretability. 4.10 Scalability We identified two main findings regarding the scalability of deep time series prediction methods in healthcare. First, although DL models are usually evaluated on a single dataset with a limited number of features, some studies confirmed their scalability to large hospital EHR datasets with high dimensionality. The fundamental observation is that higher dimensionality and larger amounts of data can further enhance model performance by raising their representational learning power [42]. Such studies have typically used single-layered GRU or LSTM architectures, but analyzing more advanced neural network schemas, such as those proposed in recent studies (Section 3.1) is a venue for future research. Also, one scalability study observed that models which are primarily purposed for EHR data may not be as effective with AC data [1]. This is mainly because potent predictive features available in EHR data, such as lab test results, tend to be missing in AC datasets. Therefore, scalability studies on AC data merits further inquiry. Second, DL models are typically compared against traditional supervised ML methods on a singular method only (Table S3). However, two studies [1,42] compared DL methods against ensembled traditional supervised learning models, both on EHR and AC data, and found their performances are comparable. This shows an important research gap for proper comparison between DL and traditional supervised learning models to identify data settings, such as feature types, dimensionality and missingness, in which DL models either perform comparably or excel against their traditional ML counterparts. 5 CONCLUSION In this work, we systematically reviewed studies focused on deep time series prediction to leverage structured patient time series data for healthcare prediction tasks from a technical perspective. The following is a summary of our main findings and suggestions: Patient representation: There are two common approaches: sequence representation and matrix representation. For prediction tasks in which inputs are numeric, such as lab tests or vital signs, sequence representations have typically been used. For those with categorical inputs, such as diagnosis codes or procedure codes, matrix representation is the premiere choice. In order to combine numeric and categorical inputs, researchers have employed three distinct methods: (1) assigning a unique token to each combination of measure name, value, and unit; (2) encoding the numeric measures categorically as missing, low, normal, or high; and (3) converting the numeric measures to severity scores to further discretize them as low, normal, or high. Moreover, embedding medical events in a sequence representation involved an additional three prevailing techniques: (1) adding a separate embedding layer to learn an optimal medical code representation from scratch, (2) adopting a pre-trained embedding layer such as with word2vec, or (3) using a medical code grouping strategy, sometimes involving clinical classification software. Comparing these diverse approaches and techniques in a solid benchmarking setting needs further investigation. ACM Trans. Manage. Inf. Syst. Missing value handling: Missing values in healthcare data are generally not missing at random, but often reflect decisions by caregivers. Capturing missing values as a separate input masking vector or learning the missing patterns with a neural network have been the most effective methods to date. Identifying impactful missing features will help healthcare providers implement optimal data collection strategies and better inform clinical decision-making. Deep learning models: RNN architectures, particularly their single-layered GRU and LSTM versions, were identified as the most prevalent networks in extant literature. These models excel at large sequences of input data representing longitudinal patient history. While RNN models extract global temporal patterns, CNNs are proficient at detecting local patterns and motifs. Combining RNN and CNN in a hybrid structure for capturing both types of patterns has become a trend in recent studies. More investigation is required to understand optimal network architecture for various hospital settings and learning tasks. Addressing temporal irregularity: For handling visit irregularity, the time interval between visits is given as an additional independent input, or alternatively, the internal memory processes of recurrent networks are slightly modified to assign differing weights to earlier versus more recent visits. When addressing feature irregularities, the memory and gating activities of RNN networks are similarly modified to learn individualized decay patterns for each feature or feature type. Overall, temporal irregularity handling methods need more robust benchmarking experiments in an assortment of hospital settings, including variations in patient type (inpatient vs. outpatient) and visit length (long- sequence vs. short-sequence). Attention mechanisms: Location-based attention is by far the most commonly used means of differentiating importance in portions of the input data and network nodes. Most studies used attention mechanisms to improve the interpretability of their proposed DL models by highlighting important visits or medical codes, but without evaluating the differential effect of attention mechanisms on prediction performance. Furthermore, we found that further inquiry is warranted to separately evaluate contributions of visit-level and medical code attention. Incorporation of medical ontologies: Researchers have incorporated medical ontology trees and knowledge graphs in the embedding layers of recurrent networks to compensate for lack of sufficient data when regarding rare diseases for prediction tasks. Using these medical domain knowledge resources, the information for such rare diseases is captured through the ancestral nodes and pathways in the tree or graph for input into network embeddings. Static data inclusion: We found four basic approaches followed by researchers to merge demographic and patient history data with the dynamic longitudinal input of EHR or AC data: (1) feeding static features to the final fully- connected layer of the neural network, (2) training a separate feedforward network for the subsequent inclusion of encoded output into the main network, (3) the repetition of static feature input at each time point in a quasi-static manner, and (4) modifying the internal processes of recurrent networks. We found no study evaluating the effects of static data on prediction performance, especially post analysis of performance results for static features. Learning strategy: Three learning strategies have been investigated by the authors included in this review: (1) cost- sensitive, (2) multi-task, and (3) transfer learning. Devising cost-sensitive learning components into DL networks is a wide-open research gap for future study. Regarding multi-task learning, researchers have reported its benefit by citing increased performance levels in a variety of healthcare outcome prediction tasks. However, multi-task learning does not make clear which network layers, components, or types of extracted temporal patterns within the architectural design should be shared among the different tasks as well as in which healthcare scenarios the multi-task strategy is most efficient. Transfer learning was the least studied method found in our systematic review, but it has promising prospects for further inquiry, as the scale of data and number of external data sets in published works increases. Interpretations: The most common approach to visualize important visits or medical codes on individual patients was the use of an attention mechanism in the neural network. Although individual-level interpretation is indeed important, as a future research direction, the use of population-level interpretation techniques to extract general patterns and identify specific medical events associated with target healthcare outcomes will be a boon for clinical decision makers. Scalability: Several studies confirm the generalizability of well-known deep time series prediction models to large hospital EHR datasets, even with high input dimensionality. However, analyzing advanced network architectures that have been proposed in recent works is a suggested venue for future research. Furthermore, some studies found that ensembles of traditional supervised learning methods have comparable performance to DL models, both on EHR and AC data. Important research gaps remain for establishing a proper comparison of DL against single or ensembled traditional ML models. In particular, it would be useful to identify patient, dimensionality, and missing value conditions in which DL models, with their higher complexity and runtimes, might be superfluous. This is a continual concern when considering the need for implementing real-time information systems that can better inform clinical decision makers. A potential limitation of this systematic review is a possible incomplete retrieval of relevant studies on deep time series prediction. Although we included a wide set of keywords, it remains challenging to conduct an inclusive search strategy with an automatic query by keyword searching. We alleviated this concern by applying snowballing search strategies from the originally included publications. In other words, we assumed that any newer publication should reference one of the former ACM Trans. Manage. Inf. Syst. included studies within their paper, especially well-known benchmarking models such as Doctor AI [7], RETAIN [22], DeepCare [65], and Deepr [63]. Another challenge was selectively differentiating the included studies from numerous other adjacent works when predicting a single clinical outcome with a DL methodology. To achieve this, we implemented a full-text review step that included all papers which specifically mention patient representations or embedding strategies. Additionally we ensured that the authors’ stated goals involved learning these representations at a patient level, and not merely devising models to maximize performance on a specific disease prediction task. The aforementioned limitations pose a potential threat to selective bias in publication trends for any systematic review, but particularly one in which publication rates increase with recency, such as seen in the ever-increasing popularity of utilizing DL models on a myriad of applications, healthcare or otherwise. REFERENCES [1] X. Min, B. Yu, F. Wang, Predictive Modeling of the Hospital Readmission Risk from Patients’ Claims Data Using Machine Learning: A Case Study on COPD, Sci. Rep. 9 (2019) 1 10. doi:10.1038/s41598-019-39071-y. [2] Y. Sha, M.D. Wang, Interpretable predictions of clinical outcomes with an attention -based recurrent neural network, in: Proc. 8th ACM Int. Conf. Bioinformatics, Comput. Biol. Heal. Informatics, ACM, 2017: pp. 233 240. doi:10.1145/3107411.3107445. [3] C. Esteban, O. Staeck, S. Baier, Y. Yang, V. Tresp, Predicting Clinical Events by Combining Static and Dynamic Information U sing Recurrent Neural Networks, in: Proc. IEEE Int. Conf. Healthc. Informatics, ICHI 2016, IEEE Computer Society, 2016: pp. 93 101. doi:10.1109/ICHI.2016.16. [4] Z. Che, Y. Cheng, S. Zhai, Z. Sun, Y. Liu, Boosting deep learning risk prediction with generative adversarial networks for electronic health records, in: Proc. - IEEE Int. Conf. Data Mining, ICDM, Institute of Electrical and Electronics Engineers Inc., 2017: pp. 787 792. doi:10.1109/ICDM.2017.93. [5] Y. Li, S. Rao, J.R.A. Solares, A. Hassaine, R. Ramakrishnan, D. Canoy, Y. Zhu, K. Rahimi, G. Salimi-Khorshidi, BEHRT: Transformer for Electronic Health Records, Sci. Rep. 10 (2020) 1 12. doi:10.1038/s41598-020-62922-y. [6] E. Choi, A. Schuetz, W.F. Stewart, J. Sun, Medical Concept Representation Learnin g from Electronic Health Records and its Application on Heart Failure Prediction, (2016). http://arxiv.org/abs/1602.03686 (accessed April 15, 2019). [7] E. Choi, M.T. Bahadori, A. Schuetz, W.F. Stewart, J. Sun, Doctor AI: Predicting Clinical Events via Rec urrent Neural Networks., in: Proc. 1st Mach. Learn. Healthc. Conf. PMLR, NIH, 2016: pp. 301 318. http://www.ncbi.nlm.nih.gov/pubmed/28286600 (accessed June 18, 2020). [8] T. Tran, T.D. Nguyen, D. Phung, S. Venkatesh, Learning vector representation of medical objects via EMR-driven nonnegative restricted Boltzmann machines (eNRBM), J. Biomed. Inform. 54 (2015) 96 105. doi:10.1016/J.JBI.2015.01.012. [9] Z. Sun, S. Peng, Y. Yang, X. Wang, F. Li, A General Fine-tuned Transfer Learning Model for Predicting Clinical Task Acrossing Diverse EHRs Datasets, in: Proc. IEEE Int. Conf. Bioinforma. Biomed. BIBM 2019, IEEE Computer Society, 2019: pp. 490 495. doi:10.1109/BIBM47256.2019.8983098. [10] D.W. Bates, S. Saria, L. Ohno-Machado, A. Shah, G. Escobar, Big data in health care: Using analytics to identify and manage high-risk and high-cost patients, Health Aff. 33 (2014) 1123 1131. doi:10.1377/hlthaff.2014.0041. [11] S. Saria, A. Goldenberg, Subtyping: What It is and Its Role in Precision Medicine, IEEE Intell. Syst. 30 (2015) 70 75. doi:10.1109/MIS.2015.60. [12] Hrayr Harutyunyan, Hrant Khachatrian, David C. Kale, Greg Ver Steeg, Aram Galstyan, Multitask learning and benchmarking with clinical time series data, Sci. Data. 6 (2019) 1 18. https://www.nature.com/articles/s41597-019-0103-9 (accessed July 9, 2020). [13] A. Rajkomar, E. Oren, K. Chen, A.M. Dai, N. Hajaj, M. Hardt, P.J. Liu, X. Liu, J. Marcus, M. Sun, P. Sundberg, H. Yee, K. Zha ng, Y. Zhang, G. Flores, G.E. Duggan, J. Irvine, Q. Le, K. Litsch, A. Mossin, J. Tansuwan, D. Wang, J. Wexler, J. Wilson, D. Ludwig, S.L. Volchenboum, K. Chou, M. Pearson, S. Madabushi, N.H. Shah, A.J. Butte, M.D. Howell, C. Cui, G.S. Corrado, J. Dean, Scalable and accurate deep learning with electronic health records, Npj Digit. Med. 1 (2018) 1 10. doi:10.1038/s41746-018-0029-1. [14] A. Avati, K. Jung, S. Harman, L. Downing, A. Ng, N.H. Shah, Improving palliative care with deep learning, in: Proc. - 2017 IEEE Int. Conf. Bioinforma. Biomed. BIBM 2017, Institute of Electrical and Electronics Engineers Inc., 2017: pp. 311 316. doi:10.1109/BIBM.2017.8217669. [15] B.A. Goldstein, A.M. Navar, M.J. Pencina, J.P.A. Ioannidis, Opportunities and challenges in developing risk prediction models with electronic health records data: a systematic review, J. Am. Med. Informatics Assoc. 24 (2017) 198 208. doi:10.1093/jamia/ocw042. [16] C. Lin, Y. Zhangy, J. Ivy, M. Capan, R. Arnold, J.M. Huddleston, M. Chi, Early diagnosis and prediction of sepsis shock by co mbining static and dynamic information using convolutional-LSTM, in: Proc. IEEE Int. Conf. Healthc. Informatics, ICHI 2018, IEEE Computer Society, 2018: pp. 219 228. doi:10.1109/ICHI.2018.00032. [17] W. Chen, S. Wang, G. Long, L. Yao, Q.Z. Sheng, X. Li, Dynamic Illness Severity Prediction via Multi -task RNNs for Intensive Care Unit, in: Proc. Int. Conf. Data Mining, ICDM, IEEE Computer Society, 2018: pp. 917 922. doi:10.1109/ICDM.2018.00111. [18] Y. Cheng, F. Wang, P. Zhang, J. Hu, Risk Prediction with Electronic Health Records: A Deep Learning Approach, in: Proc. 2016 SIAM Int. Conf. Data Min., Society for Industrial and Applied Mathematics, Philadelphia, PA, 2016: pp. 432 440. doi:10.1137/1.9781611974348.49. [19] J. Zhang, J. Gong, L. Barnes, HCNN: Heterogeneous Convolutional Neural Networks for Comorbi d Risk Prediction with Electronic Health Records, in: Proc. - 2017 IEEE 2nd Int. Conf. Connect. Heal. Appl. Syst. Eng. Technol. CHASE 2017, Institute of Electrical and Electronics Enginee rs Inc., 2017: pp. 214 221. doi:10.1109/CHASE.2017.80. [20] L. Wang, H. Wang, Y. Song, Q. Wang, MCPL-Based FT-LSTM: Medical Representation Learning-Based Clinical Prediction Model for Time Series Events, IEEE Access. 7 (2019) 70253 70264. doi:10.1109/ACCESS.2019.2919683. [21] T. Zebin, T.J. Chaussalet, Design and implementation of a deep recurrent model for prediction of readmission in urgent care using electronic health records, in: 2019 IEEE Conf. Comput. Intell. Bioinforma. Comput. Biol. CIBCB 2019, Institute of Electrical and Electronics Engineers I nc., 2019. doi:10.1109/CIBCB.2019.8791466. [22] E. Choi, M. Taha Bahadori, J.A. Kulas, A. Schuetz, W.F. Stewart, J. Sun, S. Health, RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism, in: Proc. 30th Int. Conf. Neural Inf. Process. Syst., ACM Press, 2016: pp. 3512 3520. http://papers.nips.cc/paper/6321-retain-an- interpretable-predictive-model-for-healthcare-using-reverse-time-attention-mechanism (accessed July 9, 2020). [23] E. Xu, S. Zhao, J. Mei, E. Xia, Y. Yu, S. Huang, Multiple MACE risk prediction using multi-task recurrent neural network with attention, in: 2019 IEEE Int. Conf. Healthc. Informatics, ICHI 2019, Institute of Electrical and Electronics Engineers Inc., 2019. doi:10.1109/ICHI.2019.89 04675. ACM Trans. Manage. Inf. Syst. [24] B.L.P. Cheung, D. Dahl, Deep learning from electronic medical records using attention-based cross-modal convolutional neural networks, in: Proceeding IEEE EMBS Int. Conf. Biomed. Heal. Informatics, BHI 2018, IEEE Computer Society, 2018: pp. 222 225. doi:10.1109/BHI.2018.8333409. [25] H. Wang, Z. Cui, Y. Chen, M. Avidan, A. Ben Abdallah, A. Kronzer, Predicting Hospital Readmission via Cost -Sensitive Deep Learning, IEEE/ACM Trans. Comput. Biol. Bioinforma. 15 (2018) 1968 1978. doi:10.1109/TCBB.2018.2827029. [26] R. Amirkhan, M. Hoogendoorn, M.E. Numans, L. Moons, Using recurrent neural networks to predict colorectal cancer among patients, in: 2017 IEEE Sy mp. Ser. Comput. Intell. SSCI 2017 - Proc., Institute of Electrical and Electronics Engineers Inc., 2018: pp. 1 8. doi:10.1109/SSCI.2017.8280826. [27] B. Shickel, P.J. Tighe, A. Bihorac, P. Rashidi, Deep EHR: A Survey of Recent Advances in Deep Learning Techniques for Electro nic Health Record (EHR) Analysis, IEEE J. Biomed. Heal. Informatics. 22 (2018) 1589 1604. doi:10.1109/JBHI.2017.2767063. [28] S.K. Pandey, R.R. Janghel, Recent Deep Learning Techniques, Challenges and Its Applications for Medical Healthcare System: A Review, Neural Process. Lett. 50 (2019) 1907 1935. doi:10.1007/s11063-018-09976-2. [29] C. Xiao, E. Choi, J. Sun, Opportunities and challenges in developing deep learning models using electronic health records data: a systematic review, J. A m. Med. Informatics Assoc. 25 (2018) 1419 1428. doi:10.1093/jamia/ocy068. [30] K. Yazhini, D. Loganathan, A state of art approaches on deep learning models in healthcare: An application perspective, in: Proc. Int. Conf. Trends Electron. Informatics, ICOEI 2019, Institute of Electrical and Electronics Engineers Inc., 2019: pp. 195 200. doi:10.1109/ICOEI.2019.8862730. [31] S. Srivastava, S. Soman, A. Rai, P.K. Srivastava, Deep learning for health informatics: Recent trends and future directions, in: 2017 Int. Conf. Adv. Comput. Commun. Informatics, ICACCI 2017, Institute of Electrical and Electronics Engineers Inc., 2017: pp. 1665 1670. doi:10.1109/ICACCI.2017.8126082. [32] S. Shamshirband, M. Fathi, A. Dehzangi, A.T. Chronopoulos, H. Alinejad -Rokny, A review on deep learning approaches in healthcare systems: Taxonomies, challenges, and open issues, J. Biomed. Inform. 113 (2021) 103627. doi:10.1016/j.jbi.2020.103627. [33] Y. Si, J. Du, Z. Li, X. Jiang, T. Miller, F. Wang, W. Jim Zheng, K. Roberts, Deep representation learning of patient data fro m Electronic Health Records (EHR): A systematic review, J. Biomed. Inform. 115 (2021) 103671. doi:10.1016/j.jbi.2020.103671. [34] D. Moher, A. Liberati, J. Tetzlaff, D.G. Altman, Preferred reporting items for systematic reviews and meta -analyses: the PRISMA statement, BMJ. 339 (2009) 332 336. doi:10.1136/BMJ.B2535. [35] A.E.W. Johnson, T.J. Pollard, L. Shen, L.W.H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. Anthony Celi, R.G. Mark, MIMIC -III, a freely accessible critical care database, Sci. Data. 3 (2016) 1 9. doi:10.1038/sdata.2016.35. [36] Z. Che, S. Purushotham, K. Cho, D. Sontag, Y. Liu, Recurrent Neural Networks for Multivariate Time Series with Missing Values, Sci. Rep. 8 (2018) 1 12. doi:10.1038/s41598-018-24271-9. [37] R. Yu, Y. Zheng, R. Zhang, Y. Jiang, C.C.Y. Poon, Using a Multi-Task Recurrent Neural Network with Attention Mechanisms to Predict Hospital Mortality of Patients, IEEE J. Biomed. Heal. Informatics. 24 (2020) 486 492. doi:10.1109/JBHI.2019.2916667. [38] W. Ge, J.W. Huh, Y.R. Park, J.H. Lee, Y.H. Kim, A. Turchin, An Interpretable ICU Mortality Prediction Model Based on Logistic Regression and Recurrent Neural Networks with LSTM units, in: Proceeding Am. Med. Informatics Assoc., NIH, 2018: pp. 460 469. /pmc/articles/PMC6371274/?report=abstract (accessed July 2, 2020). [39] W. Caicedo-Torres, J. Gutierrez, ISeeU: Visually interpretable deep learning for mortality prediction inside the ICU, J. Biomed. Inform. 98 (2019) 103269. doi:10.1016/j.jbi.2019.103269. [40] D. Zhang, C. Yin, J. Zeng, X. Yuan, P. Zhang, Combining structured and unstructured data for predictive models: a deep learning approach, BMC Med. Inform. Decis. Mak. 20 (2020) 1 11. doi:10.1186/s12911-020-01297-6. [41] B. Shickel, T.J. Loftus, L. Adhikari, T. Ozrazgat-Baslanti, A. Bihorac, P. Rashidi, DeepSOFA: A Continuous Acuity Score for Critically Ill Patients using Clinically Interpretable Deep Learning, Sci. Rep. 9 (2019) 1 12. doi:10.1038/s41598-019-38491-0. [42] S. Purushotham, C. Meng, Z. Che, Y. Liu, Benchmarking deep learning models on large healthcare datasets, J. Biomed. Inform. 8 3 (2018) 112 134. doi:10.1016/j.jbi.2018.04.007. [43] P. Gupta, P. Malhotra, J. Narwariya, L. Vig, G. Shroff, Transfer Learning for Clinical Time Series Analysis Using Deep Neural Networks, J. Healthc. Informatics Res. 4 (2020) 112 137. doi:10.1007/s41666-019-00062-3. [44] S. Baker, W. Xiang, I. Atkinson, Continuous and automatic mortality risk prediction using vital signs in the intensive care unit: a hybrid neural network approach, Sci. Rep. 10 (2020) 1 12. doi:10.1038/s41598-020-78184-7. [45] K. Yu, M. Zhang, T. Cui, M. Hauskrecht, Monitoring ICU mortality risk with a long short-Term memory recurrent neural network, in: Pacific Symp. Biocomput., World Scientific Publishing Co. Pte Ltd, 2020: pp. 103 114. doi:10.1142/9789811215636_0010. [46] Edward Choi, Andy Schuetz, Walter F Stewart, Jimeng Sun, Using recurrent neural network models for early detection of heart failure onset, J. Am. Med. Informatics Assoc. 24 (2017) 361 370. [47] C. Yin, R. Zhao, B. Qian, X. Lv, P. Zhang, Domain knowledge guided deep learning with electronic hea lth records, in: Proc. IEEE Int. Conf. Data Mining, ICDM, IEEE Computer Society, 2019: pp. 738 747. doi:10.1109/ICDM.2019.00084. [48] W.W. Wang, H. Li, L. Cui, X. Hong, Z. Yan, Predicting Clinical Visits Using Recurrent Neural Networks and Demographic Info rmation, in: Proc. 2018 IEEE 22nd Int. Conf. Comput. Support. Coop. Work Des. CSCWD 2018, Institute of Electrical and Electronics Engineers Inc., 2018: pp . 785 789. doi:10.1109/CSCWD.2018.8465194. [49] R. Ju, P. Zhou, S. Wen, W. Wei, Y. Xue, X. Huang, X. Yang, 3D-CNN-SPP: A Patient Risk Prediction System From Electronic Health Records via 3D CNN and Spatial Pyramid Pooling, IEEE Trans. Emerg. Top. Comput. Intell. 5 (2020) 247 261. doi:10.1109/tetci.2019.2960474. [50] L. Rasmy, W.J. Zheng, H. Xu, D. Zhi, Y. Wu, N. Wang, H. Wu, X. Geng, F. Wang, A study of generalizability of recurrent neural network -based predictive models for heart failure onset risk using a large and heterogeneous EHR data set, J. Biomed. Inform. 84 (2018) 11 16. doi:10.1016/j.jbi.2018.06.011. [51] G. Maragatham, S. Devi, LSTM Model for Prediction of Heart Failure in Big Data, J. Med. Syst. 43 (2019) 1 13. doi:10.1007/s10916-019-1243-3. [52] X. Zhang, B. Qian, Y. Li, C. Yin, X. Wang, Q. Zheng, KnowRisk: An interpretable knowledge-guided model for disease risk prediction, in: Proc. Int. Conf. Data Mining, ICDM, IEEE Computer Society, 2019: pp. 1492 1497. doi:10.1109/ICDM.2019.00196. [53] Edward Choi, Mohammad Taha Bahadori, Le Song, Walter F Stewart, Jimeng Sun, GRAM: Graph -based Attention Model for Healthcare Representation Learning, in: KDD ’17 Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., 2017: pp. 787 795. [54] T. Ma, C. Xiao, F. Wang, Health-ATM: A deep architecture for multifaceted patient health record representation and risk prediction, in: Proceeding SIAM Int. Conf. Data Mining, SDM 2018, Society for Industrial and Applied Mathematics Publications, 2018: pp. 261 269. doi:10.1137/1.9781611975321.30. [55] Edward Choi, Cao Xiao, Walter F Stewart, Jimeng Sun, MiME: multilevel medical embedding of electronic health records for predictive healthcare, in: NIPS’18 Proc. 32nd Int. Conf. Neural Inf. Process. Syst., 2018: pp. 4552 4562. https://dl.acm.org/doi/10.5555/3327345.3327366 (accessed April 12, 2021). ACM Trans. Manage. Inf. Syst. [56] J.R. Ayala Solares, F.E. Diletta Raimondi, Y. Zhu, F. Rahimian, D. Canoy, J. Tran, A.C. Pinho Gomes, A.H. Payberah, M. Zottoli, M. Nazarzadeh, N. Conrad, K. Rahimi, G. Salimi-Khorshidi, Deep learning for electronic health records: A comparative review of multiple deep neural architectures, J. Biomed. Inform. 101 (2020) 103337. doi:10.1016/j.jbi.2019.103337. [57] J. Zhang, K. Kowsari, J.H. Harrison, J.M. Lobo, L.E. Barnes, Patient2Vec: A Personalized Interpretable Deep Representation of the Longitudinal Electronic Health Record, IEEE Access. 6 (2018) 65333 65346. doi:10.1109/ACCESS.2018.2875677. [58] H. Wang, Z. Cui, Y. Chen, M. Avidan, A. Ben Abdallah, A. Kronzer, S. Louis, Cost-sensitive Deep Learning for Early Readmission Prediction at A Major Hospital, in: BioKDD ’17 Proc. Int. Work. Data Min. Bioinforma., ACM Press, 2017. [59] Y.W. Lin, Y. Zhou, F. Faghri, M.J. Shaw, R.H. Campbell, Analysis and prediction of unplanned intensive care unit readmission using recurrent neural networks with long shortterm memory, PLoS One. 14 (2019) e0218942. doi:10.1371/journal.pone.0218942. [60] S. Barbieri, J. Kemp, O. Perez-Concha, S. Kotwal, M. Gallagher, A. Ritchie, L. Jorm, Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-Risk, Sci. Rep. 10 (2020) 1 10. doi:10.1038/s41598-020-58053-z. [61] A. Ashfaq, A. Sant’Anna, M. Lingman, S. Nowaczyk, Readmission prediction using deep learning on electronic health records, J. Biomed. Inform. 97 (2019) 103256. doi:10.1016/j.jbi.2019.103256. [62] B.K. Reddy, D. Delen, Predicting hospital readmission for lupus patients: An RNN-LSTM-based deep-learning methodology, Comput. Biol. Med. 101 (2018) 199 209. doi:10.1016/j.compbiomed.2018.08.029. [63] P. Nguyen, T. Tran, N. Wickramasinghe, S. Venkatesh, Deepr: A Convolutional Net for Medical Records, IEEE J. Biomed. Heal. Informatics. 21 (2017) 22 30. doi:10.1109/JBHI.2016.2633963. [64] Z.C. Lipton, D.C. Kale, C. Elkan, R. Wetzel, Learning to Diagnose with LSTM Recurrent Neural Networks, (2015). http://arxiv. org/abs/1511.03677 (accessed April 15, 2019). [65] T. Pham, T. Tran, D. Phung, S. Venkatesh, DeepCare: A Deep Dynamic Memory Model for Predictive Medicine, in: Proceeding Adv. Knowl. Discov. Data Min., Springer, 2016: pp. 30 41. doi:10.1007/978-3-319-31750-2_3. [66] Y. Yang, X. Zheng, C. Ji, Disease Prediction Model Based on BiLSTM and Attention Mechanism, in: Proc. - 2019 IEEE Int. Conf. Bioinforma. Biomed. BIBM 2019, Institute of Electrical and Electronics Engineers Inc., 2019: pp. 1141 1148. doi:10.1109/BIBM47256.2019.8983378. [67] W. Guo, W. Ge, L. Cui, H. Li, L. Kong, An Interpretable Disease Onset Predictive Model Using Crossover Attention Mechanism fr om Electronic Health Records, IEEE Access. 7 (2019) 134236 134244. doi:10.1109/ACCESS.2019.2928579. [68] T. Wang, Y. Tian, R.G. Qiu, Long Short-Term Memory Recurrent Neural Networks for Multiple Diseases Risk Prediction by Leveraging Longitudinal Medical Records, IEEE J. Biomed. Heal. Informatics. (2019). doi:10.1109/JBHI.2019.2962366. [69] Fenglong Ma, Radha Chitta, Jing Zhou, Quanzeng You, Tong Sun, Jing Gao, Dipole: Diagnosis Prediction in Healthcare via Attention -based Bidirectional Recurrent Neural Networks, in: Proc. 23rd ACM SIGKDD Int. Conf. Knowl. Discov. Data Min., ACM Press, 2017: pp. 1903 1911. https://dl.acm.org/doi/abs/10.1145/3097983.3098088 (accessed July 2, 2020). [70] Fenglong Ma, Quanzeng You, Houping Xiao, Radha Chitta, Jing Zhou, Jing Gao, KAME: Knowledge -based Attention Model for Diagnosis Prediction in Healthcare, CIKM ’18 Proc. 27th ACM Int. Conf. Inf. Knowl. Manag. (2018) 743 752. https://dl.acm.org/doi/abs/10.1145/3269206.3271701 (accessed July 3, 2020). [71] T. Pham, T. Tran, D. Phung, S. Venkatesh, Predicting healthcare trajectories from medical records: A deep learning approach, J. Biomed. Inform. 69 (2017) 218 229. doi:10.1016/j.jbi.2017.04.001. [72] J.M. Lee, M. Hauskrecht, Recent context-aware LSTM for clinical event time-series prediction, in: Proceeding Conf. Artif. Intell. Med., Springer, 2019: pp. 13 23. doi:10.1007/978-3-030-21642-9_3. [73] D. Lee, X. Jiang, H. Yu, Harmonized representation learning on dynamic EHR graphs, J. Biomed. Inform. 106 (2020) 103426. doi: 10.1016/j.jbi.2020.103426. [74] Z.C. Lipton, D.C. Kale, R. Wetzel, L.K. Whittier, Directly Modeling Missing Data in Sequences with RNNs: Improved Classification of Clinical Time Series, in: Proc. 1st Mach. Learn. Healthc. Conf. PMLR, PMLR, 2016: pp. 253 270. http://proceedings.mlr.press/v56/Lipton16.html (accessed March 5, 2021). [75] T. Bai, A.K. Chanda, B.L. Egleston, S. Vucetic, Joint learning of representations of medical concepts and words from EHR data, in: Proc. - 2017 IEEE Int. Conf. Bioinforma. Biomed. BIBM 2017, Institute of Electrical and Electronics Engineers Inc., 2017: pp. 764 769. doi:10.1109/BIBM.2017.8217752. [76] D. Liu, Y.L. Wu, X. Li, L. Qi, Medi-Care AI: Predicting medications from billing codes via robust recurrent neural networks, Neural Networks. 124 (2020) 109 116. doi:10.1016/J.NEUNET.2020.01.001. [77] M. Zhang, C.R. King, M. Avidan, Y. Chen, Hierarchical Attention Propagation for Healthcare Representation Learning | Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, in: Proc. 26th ACM SIGKDD Int. Conf. Knowl. Discov. Dat a Min., 2020: pp. 249 256. https://dl.acm.org/doi/abs/10.1145/3394486.3403067 (accessed November 2, 2021). [78] Z. Qiao, Z. Zhang, X. Wu, S. Ge, W. Fan, MHM: Multi-modal Clinical Data based Hierarchical Multi-label Diagnosis Prediction, Proc. 43rd Int. ACM SIGIR Conf. Res. Dev. Inf. Retr. (n.d.). doi:10.1145/3397271. [79] J. Park, J.W. Kim, B. Ryu, E. Heo, S.Y. Jung, S. Yoo, Patient-level prediction of cardio-cerebrovascular events in hypertension using nationwide claims data, J. Med. Internet Res. 21 (2019). doi:10.2196/11757. [80] Y. An, N. Huang, X. Chen, F. Wu, J. Wang, High-risk Prediction of Cardiovascular Diseases via Attention-based Deep Neural Networks, IEEE/ACM Trans. Comput. Biol. Bioinforma. 18 (2019) 1093 1105. doi:10.1109/tcbb.2019.2935059. [81] H. Duan, Z. Sun, W. Dong, Z. Huang, Utilizing dynamic treatment information for MACE prediction of acute coronary syndrome, BMC Med. Inform. Decis. Mak. 19 (2019) 1 11. doi:10.1186/s12911-018-0730-7. [82] S. Park, Y.J. Kim, J.W. Kim, J.J. Park, B. Ryu, J.W. Ha, Interpretable prediction of vascular diseases from electronic health records via deep attention networks, in: Proc. IEEE 18th Int. Conf. Bioinforma. Bioeng. BIBE 2018, IEEE Computer Society, 2018: pp. 110 117. doi:10.1109/BIBE.2018.00028. [83] Y. Zhang, C. Lin, M. Chi, J. Ivy, M. Capan, J.M. Huddleston, LSTM for septic shock: Adding unreliable labels to reliable predictions, in: Proc. - 2017 IEEE Int. Conf. Big Data, Big Data 2017, Institute of Electrical and Electronics Engineers Inc., 2017: pp. 1233 1242. doi:10.1109/BigData.2017.8258049. [84] Y. Zhang, X. Yang, J. Ivy, M. Chi, Time-aware adversarial networks for adapting disease progression modeling, in: 2019 IEEE Int. Conf. Healthc. Informatics, ICHI 2019, Institute of Electrical and Electronics Engineers Inc., 2019. doi:10.1109/ICHI.2019.8904698. [85] S.D. Wickramaratne, M.D. Shaad Mahmud, Bi-Directional Gated Recurrent Unit Based Ensemble Model for the Early Detection of Sepsis, in: Proc. Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. EMBS, Institute of Electrical and Electronics Engineers Inc., 2020: pp. 70 73. doi:10.1109/EMBC44109.2020.9175223. [86] P. Svenson, G. Haralabopoulos, M. Torres Torres, Sepsis Deterioration Prediction Using Channelled Long Short-Term Memory Networks, Lect. Notes Comput. Sci. (Including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics). 12299 LNAI (2020) 359 370. doi:10.1007/978-3-030-59137-3_32. [87] J. Fagerström, M. Bång, D. Wilhelms, M.S. Chew, LiSep LSTM: A Machine Learning Algorithm for Early Detection of Septic Shock, Sci. Reports 2019 91. 9 ACM Trans. Manage. Inf. Syst. (2019) 1 8. doi:10.1038/s41598-019-51219-4. [88] R. Mohammadi, S. Jain, S. Agboola, R. Palacholla, S. Kamarthi, B.C. Wallace, Learning to Identify Patients at Risk of Uncontr olled Hypertension Using Electronic Health Records Data., AMIA Jt. Summits Transl. Sci. Proceedings. AMIA Jt. Summits Transl. Sci. 2019 (2019) 533 542. http://www.ncbi.nlm.nih.gov/pubmed/31259008 (accessed July 2, 2020). [89] X. Ye, Q.T. Zeng, J.C. Facelli, D.I. Brixner, M. Conway, B.E. Bray, Predicting Optimal Hypertensio n Treatment Pathways Using Recurrent Neural Networks, Int. J. Med. Inform. 139 (2020) 104122. doi:10.1016/j.ijmedinf.2020.104122. [90] H.C. Thorsen-Meyer, A.B. Nielsen, A.P. Nielsen, B.S. Kaas-Hansen, P. Toft, J. Schierbeck, T. Strøm, P.J. Chmura, M. Heimann, L. Dybdahl, L. Spangsege, P. Hulsen, K. Belling, S. Brunak, A. Perner, Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records, Lancet Digit. Heal. 2 (2020) e179 e191. doi:10.1016/S2589-7500(20)30018-2. [91] Kaiping Zheng, Wei Wang, Jinyang Gao, Kee Yuan Ngiam, Bengchin Ooi, Weiluen Yip, Capturing Feature -Level Irregularity in Disease Progression Modeling | Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, in: CIKM ’17 Proc. 2017 ACM Conf. Inf. Knowl. Manag., 2017: pp. 1579 1588. https://dl.acm.org/doi/abs/10.1145/3132847.3132944 (accessed March 4, 2021). [92] Q. Suo, F. Ma, G. Canino, J. Gao, A. Zhang, P. Veltri, G. Agostino, A Multi-Task Framework for Monitoring Health Conditions via Attention-based Recurrent Neural Networks, AMIA Annu. Symp. Proc. 2017 (2017) 1665. /pmc/articles/PMC5977646/ (accessed November 2, 2021). [93] N. Tomašev, X. Glorot, J.W. Rae, M. Zielinski, H. Askham, A. Saraiva, A. Mottram, C. Meyer, S. Ravuri, I. Protsyuk, A. Connell, C.O. Hughes, A. Karthikesalingam, J. Cornebise, H. Montgomery, G. Rees, C. Laing, C.R. Baker, K. Peterson, R. Reeves, D. Hassabis, D. King, M . Suleyman, T. Back, C. Nielson, J.R. Ledsam, S. Mohamed, A clinically applicable approach to continuous prediction of future acute kidney injury, Na ture. 572 (2019) 116 119. doi:10.1038/s41586-019-1390-1. [94] R. Qiu, Y. Jia, F. Wang, P. Divakarmurthy, S. Vinod, B. Sabir, M. Hadzikadic, Predictive Modeling of the Total Joint Replacem ent Surgery Risk: a Deep Learning Based Approach with Claims Data, in: Proc. AMIA Summits Transl. Sci., NIH, 2019: p. 562. htt ps://www.hcup-us.ahrq.gov/toolssoftware/ccs/ccs.jsp (accessed July 2, 2020). [95] Y. Ge, Q. Wang, L. Wang, H. Wu, C. Peng, J. Wang, Y. Xu, G. Xiong, Y. Zhang, Y. Yi, Predicting post -stroke pneumonia using deep neural network approaches, Int. J. Med. Inform. 132 (2019). doi:10.1016/j.ijmedinf.2019.103986. [96] N. Razavian, J. Marcus, D. Sontag, Multi-task Prediction of Disease Onsets from Longitudinal Lab Tests, in: Proc. 1st Mach. Learn. Healthc. Conf., PMLR, 2016: pp. 73 100. https://github. (accessed February 24, 2021). [97] J. Rebane, I. Karlsson, P. Papapetrou, An investigation of interpretable deep learning for adverse drug event prediction, in: Proc. - IEEE Symp. Comput. Med. Syst., Institute of Electrical and Electronics Engineers Inc., 2019: pp. 337 342. doi:10.1109/CBMS.2019.00075. [98] M.A. Morid, O.R.L. Sheng, K. Kawamoto, S. Abdelrahman, Learning Hidden Patterns from Patient Multivariate Time Series Data Us ing Convolutional Neural Networks: A Case Study of Healthcare Cost Prediction, J. Biomed. Inform. 111 (2020) 103565. doi:10.1016/j.jbi.2020.103565. [99] Y. Xiang, H. Ji, Y. Zhou, F. Li, J. Du, L. Rasmy, S. Wu, W.J. Zheng, H. Xu, D. Zhi, Y. Zhang, C. Tao, Asthma Exacerbation Pre diction and Risk Factor Analysis Based on a Time-Sensitive, Attentive Neural Network: Retrospective Cohort Study, J Med Internet Res. 22 (2020) e16981. doi:10.2196/16981. [100] C. Gao, C. Yan, S. Osmundson, B.A. Malin, Y. Chen, A deep learning approach to predict neonatal encephalopathy from electroni c health records, in: 2019 IEEE Int. Conf. Healthc. Informatics, ICHI 2019, Institute of Electrical and Electronics Engineers Inc., 2019. doi:10.1109/IC HI.2019.8904667. [101] L. Ma, Y. Zhang, Using Word2Vec to process big text data, in: Proc. - 2015 IEEE Int. Conf. Big Data, IEEE Big Data 2015, Institute of Electrical and Electronics Engineers Inc., 2015: pp. 2895 2897. doi:10.1109/BigData.2015.7364114. [102] W. Cheng, C. Greaves, M. Warren, From n-gram to skipgram to concgram, Int. J. Corpus Linguist. 11 (2006) 411 433. doi:10.1075/ijcl.11.4.04che. [103] M.Q. Stearns, C. Price, K.A. Spackman, A.Y. Wang, SNOMED clinical terms: overview of the development process and project stat us., Proc. AMIA Symp. (2001) 662 666. /pmc/articles/PMC2243297/?report=abstract (accessed May 21, 2021). [104] P. Ernst, A. Siu, G. Weikum, KnowLife: A versatile approach for constructing a large knowledge graph for biomedical sciences, BMC Bioinformatics. 16 (2015) 1 13. doi:10.1186/s12859-015-0549-5. [105] A. Shrikumar, P. Greenside, A. Kundaje, Learning Important Features Through Propagating Activation Differences, in: Proc. 34th Int. Conf. Mach. Learn., PMLR, 2017: pp. 3145 3153. http://goo.gl/RM8jvH. (accessed May 21, 2021). [106] E. Winter, Chapter 53 The shapley value, Handb. Game Theory with Econ. Appl. 3 (2002) 2025 2054. doi:10.1016/S1574-0005(02)03016-3. [107] I. Silva, G. Moody, D.J. Scott, L.A. Celi, R.G. Mark, Predicting in-hospital mortality of ICU patients: The PhysioNet/Computing in cardiology challenge 2012 | IEEE Conference Publication | IEEE Xplore, in: Comput. Cardiol. (2010)., 2012: pp. 245 248. [108] C. Fang, C. Wang, Time Series Data Imputation: A Survey on Deep Learning Approaches, ArXiv. (2020). http://arxiv.org/abs/2011 .11347 (accessed May 10, 2021). [109] Y. Lecun, Y. Bengio, G. Hinton, Deep learning, Nature. 521 (2015) 436 444. doi:10.1038/nature14539. [110] S.M. Boker, S.S. Tiberio, R.G. Moulder, Robustness of Time Delay Embedding to Sampling Interval Misspecification, Contin. Tim e Model. Behav. Relat. Sci. (2018) 239 258. doi:10.1007/978-3-319-77219-6_10. [111] W. Liu, P. Zhou, Z. Wang, Z. Zhao, H. Deng, Q. Ju, FastBERT: a Self-distilling BERT with Adaptive Inference Time, in: Proc. 58th Annu. Meet. Assoc. Comput. Linguist., 2020: pp. 6035 6044. [112] C.P. Rees, S. Hawkesworth, S.E. Moore, B.L. Dondeh, S.A. Unger, Factors affecting access to healthcare: An observational study of children under 5 years of age presenting to a rural gambian primary healthcare centre, PLoS One. 11 (2016) e0157790. doi:10.1371/journal.pone.0157790. [113] X. Li, Y. Zhou, N.C. Dvornek, Y. Gu, P. Ventola, J.S. Duncan, Efficient Shapley Explanation for Features Importance Estimation Under Uncertainty, in: Med. Image Comput. Comput. Assist. Interv., 2020: pp. 792 801. doi:10.1007/978-3-030-59710-8_77. [114] C. Beck, A. Jentzen, B. Kuckuck, Full error analysis for the training of deep neural networks, (2019). https://arxiv.org/abs/1910 .00121v2 (accessed November 17, 2021). ACM Trans. Manage. Inf. Syst.
ACM Transactions on Management Information Systems (TMIS) – Association for Computing Machinery
Published: Jan 16, 2023
Keywords: Systematic review
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.