Access the full text.
Sign up today, get DeepDyve free for 14 days.
Examining Disease Multimorbidity in U.S. Hospital Visits Before and During COVID-19 Pandemic: A Graph Analytics Approach Disease multimorbidity analysis using graphs KARTHIK SRINIVASAN0F* School of Business, University of Kansas, Lawrence KS, U.S., karthiks@ku.edu JINHANG JIANG Walmart Inc. Bentonville AR, U.S., Jinhang.Jiang@walmart.com Enduring effects of the COVID-19 pandemic on healthcare systems can be preempted by identifying patterns in diseases recorded in hospital visits over time. Disease multimorbidity or simultaneous occurrence of multiple diseases is a growing global public health challenge as populations age and long-term conditions become more prevalent. We propose a graph analytics framework for analyzing disease multimorbidity in hospital visits. Within the framework, we propose a graph model to explain multimorbidity as a function of prevalence, category, and chronic nature of the underlying disease. We apply our model to examine and compare multimorbidity patterns in public hospitals in Arizona, U.S., during five six-month time periods before and during the pandemic. We observe that while multimorbidity increased by 34.26% and 41.04% during peak pandemic for mental disorders and respiratory disorders respectively, the gradients for endocrine diseases and circulatory disorders were not significant. Multimorbidity for acute conditions is observed to be decreasing during the pandemic while multimorbidity for chronic conditions remains unchanged. Our graph analytics framework provides guidelines for empirical analysis of disease multimorbidity using electronic health records. The patterns identified using our proposed graph model informs future research and healthcare policy makers for pre-emptive decision making. CCS CONCEPTS • Information Systems • Information systems applications • Decision support systems • Data analytics Additional Keywords and Phrases: graph analytics, electronic health records, disease co-occurrence networks, exponential random graph modeling, COVID-19 analysis 1 INTRODUCTION The severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) or COVID-19 pandemic has become one of the most transmissible infectious pandemics in human history. COVID-19 has directly or indirectly influenced many aspects of human society, including hospital visits, treatment quality, and healthcare delivery [1]. Common Business Intelligence (BI) applications utilizing hospital discharge records employ descriptive analysis and data mining for analyzing incidence patterns of individual diseases [2]–[4]. However, diseases are inter-related (e.g., high blood pressure increases the risk of cardiac arrest) and are often recorded simultaneously in hospital visits using diagnoses codes. While conventional analytics methods such as linear regression and data mining are suitable for analyzing individual disease patterns, they are often inadequate for studying co-occurrence and co-existence of multiple health conditions, also known as multimorbidity. Accurately capturing multimorbidity is an important problem as it has been shown to be closely related to important health outcomes such as mortality, treatment costs, and future health risks [5], [6]. Furthermore, as the average population age across the world increases and long-term conditions become more prevalent, multimorbidity may become an even more serious problem to tackle. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2022 Association for Computing Machinery. 2158-656X/2022/1-ART1 $15.00 http://dx.doi.org/10.1145/3564274 ACM Trans. Manage. Inf. Syst. Graph networks have been successfully used in studying disease inter-relationships and their impact on individual health outcomes [7], [8]. However, existing studies have either focused primarily on enhancing the predictive capabilities of complex models [8], [9] or exploratory analysis of the network formation [7]. In this study, we present a graph analytics framework for analyzing the multimorbidity patterns recorded in electronic health records (EHR) of hospital visits of a population. The foundation of the framework is an empirical undirected weighted network of diseases representing their pairwise co-occurrences termed as disease co-occurrence network (DCN). The DCN is trained on hospital discharge records in which the edge weight is equivalent to the probability of co-occurrence between disease pairs. Exploratory analysis, explanatory modeling and predictive modeling methods can be used on the DCN for discovering, modeling, and predicting multimorbidity patterns. We delve further into explanatory modeling component of our framework and propose a graph model to explain the relationship between multimorbidity and observable disease characteristics. Using our graph model, we examine discharge records of hospitals in Arizona U.S. during five 6-month periods, from the first half of 2019 (Jan 2019 - June 2019), to the first half of 2021 (January 2021 - June 2021). Our analysis shows that multimorbidity is more during COVID-19 than before the pandemic. We observe that while multimorbidity increased by more than 30% for mental disorders and respiratory disorders during peak pandemic, it did not significantly change for endocrine diseases and circulatory disorders. However, multimorbidity for acute conditions is different post- pandemic while chronic conditions multimorbidity remains largely unaffected. Our proposed model enables health analytics researchers to analyze and explain disease multimorbidity patterns. Our study informs healthcare policy makers towards delving further into specific hospital discharge reporting patterns to take pre-emptive actions for averting impending public health crises. We make the following contributions to design science and health analytics research. First, we propose a unifying graph analytics framework that complements existing tabular analytics approaches for analyzing hospital discharge data. Though exploratory and predictive modeling methods for graphs exist, our study is the first to propose a comprehensive framework for multimorbidity pattern analysis. The framework exploits a weighted undirected disease co-occurrence network with interpretable edge weights as well as introduces a heuristic for transforming the network into a binary network. Next, we propose a graph model for explaining disease multimorbidity as a function of known variables. Finally, our analysis uncovers novel multimorbidity patterns before and during COVID-19 pandemics in hospital visits. Our design artifacts and study inferences have multiple applications and implications for healthcare researchers and policy makers. The rest of the paper is structured as follows. Section 2 enumerates existing literature in multimorbidity analysis, disease co-occurrence networks, and portrays the need for ERGM modeling for explaining the structure of such networks. A graph analytics framework for analyzing disease multimorbidity is introduced in Section 3. Section 4 contains the results from exploratory and explanatory analyses on disease co-occurrence networks trained over different time periods before and during the pandemic. Finally, Section 5 summarizes the relevance, implications, contributions, limitations, and assumptions of our study. ACM Trans. Manage. Inf. Syst. 2 RELATED WORK Multimorbidity is the simultaneous occurrence of two or more diseases recorded in a hospital visit. Individuals with multimorbidity have poor quality of life, psychological distress, worsening functional capacity, longer hospital stays, and more postoperative complications, leading to higher costs of care [10]. While conventional business intelligence (BI) and data mining methods for analyzing individual disease patterns are well established [11], [12], fewer studies have expressly accounted for multimorbidity in their problem formulation [7]–[9], [13]. Disease comorbidity and multimorbidity patterns have been captured using different mechanisms and data sources in prior literature. For example, the human disease network [14] is based on common disease- gene associations, while disease semantic network and disease treatment networks are based on underlying common semantics and treatments between disease pairs [15]. Disease co-occurrence networks (DCN), also known as comorbidity networks, are constructed based on repeated evidence of pairs of diseases reported across patient visits and are weighted undirected networks [7], [16]. DCNs are empirically generated from massive electronic health records and therefore complement clinical knowledge on multimorbidity obtained from studying genetic correlations [14] and pathological studies[17]. Different edge weights have been proposed for DCNs [7], [18]. However, a good edge weight metric overcomes challenges such as distribution skewness or preference for common diseases [9]. For example, just using the absolute co-occurrence value as the edge weight [14] can be problematic as co- occurrence will be very high for pairs of prevalent diseases thus overlooking rare conditions. Improvements such as the Steinhaeuser-Chawla coefficient [16], Relative risk [19], Pearsons correlation [7], and Salton Cosine Index [13] have been proposed as improved edge weights. However, most measures have prominent right skewed distributions with range of values not constrained in a definite interval. They are also shown to have prevalence bias as they either overrepresent of underrepresent rare or common diseases [9]. Comparison between different edge weights for DCNs showed that edge weights resembling probability values and constrained within the range of [0,1] are most suitable for capturing multimorbidity since they are most interpretable and reduce prevalence bias [9]. Existing methods analyzing multimorbidity using DCNs either offer an exploratory perspective by comparing network visualizations and structural properties [7], or are primarily designed as black-box models for making predictions [8], [20]. Explaining the structure of a network, that is, why certain node- pairs have edges and others do not, can be helpful to between understand a network phenomenon [21]. In our problem context, an explanatory graph model can provide insights about how and why pairs of disease co-occur in patient visits thus expanding current clinical knowledge on multimorbidity and epidemiology. Exponential random graph modeling (ERGM) is an established statistical modeling technique for explaining graph network structures [22], [23]. ERGM assumes that the probability of the network structure is a function of network statistics and exogenous factors captured as node and edge attributes. Though ERGM has been widely adopted in other domains such as marketing [21] and medicine [24], ours is the first study to use an ERGM approach for explaining disease multimorbidity. Health analytics research using hospital discharge records can be broadly categorized as explanatory or descriptive analysis research, or healthcare predictive analytics (HPA) [5], [8], [25]. We introduce a graph ACM Trans. Manage. Inf. Syst. analytics framework that outlines the process for analyzing multimorbidity using DCNs in this study. Our framework encapsulates all the three analytics approaches – exploratory analysis, explanatory modeling, and predictive modeling. It can be used as a guide for future analytics research on multimorbidity. Multimorbidity analysis can be combined with conventional tabular healthcare analytics methods such as survival analysis, logistic regression, and patient clustering, to improve public health and enhance patient outcomes. We briefly explain the graph analytics framework in the next section followed by formulating our ERGM-based graph model for explaining disease multimorbidity as a function of observed variables. 3 GRAPH ANALYTICS FRAMEWORK Our proposed framework builds upon existing information systems that analyze hospital discharge records [26]. It complements traditional analytics practices over tabular data such as ranking disease prevalence or using data mining to predict outcomes such as readmission risk by including graph analytics for multimorbidity analysis. Figure 1 shows a visual representation of our proposed framework for analyzing disease multimorbidity. The DCN is the foundation of the framework and it has three interlinked components – exploratory analysis, explanatory modeling, and predictive modeling. The DCN is created empirically over hospital discharge datasets with diagnoses codes listed serially either in sequence or otherwise. It is a homogenous, weighted and undirected network. It has edge weights that vary between 0 and 1 analogous to probability values. It can be converted to an unweighted undirected network by setting a suitable threshold on the edge weight distribution. Within exploratory analysis, structural information of the DCN and its nodes such as order, size, density, eigenvector centrality, and transitivity can be examined. Furthermore, newer methods such as DeltaCon [27] or NetSimile [28] can be used for comparison two or more networks. As part of explanatory modeling, either mathematical models such as preferential attachment model [29], or statistical graph models such as exponential random graph models can be trained. Statistical graph models can incorporate input variables that explain the structure of the network and hence are more conducive for hypothesis testing and statistical inference. In predictive modeling, embedding vectors of nodes and edges of the DCN can be used for representing latent features or fitting a high-dimensional non-linear model towards predicting given outcome(s). Structural properties of nodes and networks can be used as input variables in the explanatory model (e.g., number of triangles, degree, transitivity), and can also be included into input feature sets of prediction models for enhancing prediction performance. Furthermore, explanatory models in graph analytics are generative models and their simulations can themselves be used for augmenting predictions [30]. In the following sub-sections, we describe each component of our framework in more detail. ACM Trans. Manage. Inf. Syst. Figure 1: A graph analytics framework for analyzing disease multimorbidity 3.1 Disease co-occurrence network A disease co-occurrence network (DCN) is generated based on the evidence of pairs of disease diagnoses concurrently reported in one or more patient visits. It is a suitable artifact to collectively analyze multimorbidity of all diseases recorded in a dataset simultaneously. A network has entities of interest called nodes or vertices and evidence of their pairwise inter-relationship called links or edges. The disease co-occurrence network can be considered as a weighted undirected network as evidence of co-occurrence is symmetric (i.e., if disease co-occurred with disease , number of times, that means disease co- occurred with disease the same number of times). The edge weights emphasize the higher comorbidity of frequently co-occurring disease pairs. We formulate the disease co-occurrence network (DCN) formally as follows. Let represent a DCN, with , , and representing nodes, edges, and edge weights, respectively. In this study, the labels of the nodes are diagnoses codes based on International Classification of Diseases (ICD-10-CM) nomenclature, the most common nomenclature for illness reporting in electronic health records. The edges represent the evidence of co-occurrence of a pair of diseases, and the weights are a function of the number of times the pair of diseases co-occurred. There are multiple methods to determine the edge weights [7], [16]. We define the edge weight as a ratio of probability of a disease pair co-occurring (i.e., ( )), and the quadratic mean of the marginal probabilities { } as follows: ACM Trans. Manage. Inf. Syst. ( ) (1) ( ) This expression is intuitive in the sense that the evidence of the pair of diseases is penalized by the collective evidence of the occurrence of each individual disease in the pair. Next, we represent the probabilities of Equation (1) in terms of relative frequencies , as follows: (2) ( ) The resulting expression in Equation (2) is analogous to the co-occurrence coefficient proposed by [9]. In Equation (2), and represent the prevalence of diseases and , and is the co-occurrence of disease pair . is more interpretable than other proposed weight measures for DCN as it is confined to the range between 0 and 1. For example, the raw co-occurrence score can take any value in natural numbers scale but a lower value of co-occurrence between a rare disease pair is not comparable with a corresponding count of co-occurrence of two common conditions. Extending this argument, the edge weight in Equation (2) is not biased towards either rare or common diseases as the numerator and denominator vary by the same order of magnitude for extreme values. For example, with two highly common diseases, even though their joint probability can be expected to be very high, the denominator may scale down the edge weight to ensure it is comparable to edge weights of less common diseases that often co-occur. A similar argument may be made for rare disease-pairs. Furthermore, the final expression for does not depend on dataset size , hence making it possible to compare DCNs developed on different hospital discharge datasets across geographies and time periods. The statistical significance of the edge weight can be determined based on a two-tailed t-test similar to the that for Pearsons coe fficient [31]. We assume the degree of freedom as to get a moderately conservative elimination of insignificant edge weights. Finally, the weighted DCN can be transformed to an unweighted DCN by setting a suitable threshold for improving the flexibility of benchmarking a wider range of models as well as testing a simpler hypothesis of presence or absence of an edge between any node-pair. Either domain-based thresholds or heuristics can be used for setting the discretization threshold. Our proposed heuristic to identify a suitable threshold is explained in the analysis. 3.2 Exploratory analysis - Network statistics and comparison Exploratory analysis consists of examining network-level, node-level, and edge-level properties of graphs. Network-level properties include type of network, description of node and edge attributes, density of network, average path length, number of k-star components, as well as aggregate node properties. Node properties include node centrality measures such as degree centrality, betweenness centrality, eigen vector centrality, page rank, and clustering coefficient. Secondly, community detection can be used in an ACM Trans. Manage. Inf. Syst. unsupervised manner to identify groups of nodes connected to each other and having similar subnetwork characteristics. In addition, network visualization also used to visually identify patterns that may be indicative of low/high density, influential nodes, isolated subgraphs, fully connected components, clusters, etc. For comparing networks across different time periods, modern network comparison methods such as DeltaCon [27], ResistancePerturbation [32], NetLSD [33], Degree divergence, NetSimile [28] can be used. In this study, we use NetLSD, Degree Divergence and NetSimile, as these measures are size-independent and do not need node correspondence between networks. These three measures are simple to interpret and allow comparison of disease networks that may have slightly different set of diseases recorded (e.g., COVID-19 disease code was introduced in 2019 and hence disease networks of earlier periods may not have the particular node label). NetLSD extracts a compact heat trace signature of a graph based on the Laplacian spectrum of the network structure. Degree divergence compares the degree distributions of two graphs. NetSimile uses the following node-level features for constructing the distance vector between two networks – number of neighbors of node , clustering coefficient of node , average number of node s two-hop away neighbors, number of edges in node s ego network, and number of neighbors of node . The Canberra distance is used for computing the distance between a pair of networks with a d- dimensional distance vector as it is most discriminative [28] as shown below: (3) While NetLSD compares network using a kernel transformation of the entire network, degree divergence provides an intuitive scoring mechanism quantifying potential differences in density and degree statistics of graphs. NetSimile is more comprehensive and flexible as it uses set of node-level features that can be modified based on specific applications. The comparison between the DCNs generated using visit records belonging to different time periods before and during the COVID-19 pandemic informs if the overall multimorbidity across diseases changed during the pandemic. 3.3 Explanatory modeling While exploratory analysis methods highlight patterns, explanatory modeling methods for graphs are required to assess how and why multimorbidity differed during different time periods of the COVID-19 pandemic. While linear regression models are commonly used for tabular data to explain causal and associative phenomena, explanatory modeling for graphs is more nuanced. Explanatory models for representing graph structures can be either mathematical models that are based on assuming an underlying mechanism such as the Barabasi-Albert model [29], or based on statistical modeling. Statistical models are more popular as they account for stochasticity and assume that the network can be estimated empirically [34]. Exponential random graph models (ERGM) are a family of statistical models used to describe parsimoniously the local selection forces that shape the global structure of the network [23]. ERGM assumes that the probability of the network structure (i.e., a particular pattern of presence and absence of edges between given nodes) is a function of network statistics and exogenous information. ACM Trans. Manage. Inf. Syst. The purpose of an ERGM is to identify the processes that influence link creation in a network [35]. An ERGM model provides information about the statistical significance of association between inputs and odds of presence of a link. Post (model) formulation, interpretability of the model is similar to that of a linear model for tabular data. The general formulation of a Bernoulli ERGM for an unweighted network is given as follows: ( ) (4) In Equation (4), is a vector of model coefficients and is a vector of network statistics such as number of edges, number of triangles, number of 3-stars, etc. in the given network configuration that can be represented as an adjacency matrix. is a normalizing constant that ensures that the equation is a legitimate probability distribution. The equation can be expanded to allow node-related attributes X by expressing it as follows: ( ) (5) In Equation (5), the additional node-dependent network statistics can be represented as follows: ∑ ∑ (6) In Equation (6), if edge exists between nodes and , else . may be defined as an additive function, i.e., ( ) ( ) ( ) , or an equivalence function, i.e., ( ) ( ) , where is an indicator function. The additive and equivalence functions determine whether aggregate value of node attributes or similarity of node attributes are related to the presence of an edge, respectively [34]. Given the data, we were interested in analyzing how the existing (or absence) of co-occurrence link between pairs of diseases is related to empirical information available about the diseases and how this association changed over the COVID-19 pandemic period. For instance, we wanted to investigate if multimorbidity differs across four categories of diseases classified in the ICD-10 system most relevant to COVID-19, i.e., respiratory system disorders, mental/behavioral disorders, circulatory system disorders, endocrine/nutritional/metabolic diseases. If the level of multimorbidity for any of these categories increased during the COVID-19 timeframe, it means that the pandemic has direct or indirect implications on the level of hospitalization of patients with these diseases and related (comorbid) conditions. Further, we also wanted to investigate if and how acute or chronic condition(s) are multimorbid compared to conditions classified otherwise. Our proposed statistical model for explaining the disease co-occurrence network for a given time period is represented as follows: ACM Trans. Manage. Inf. Syst. ( ) { ∑ ( ( ) ∑ ( ) ( ) (7) ( ) ) } In Equation (7), if edge exists between nodes and , indicates the number of edges in the network and is the intercept of the model, is the coefficient for the sum of prevalence of the disease-pair, and is the coefficient for the disease categories. ( ) is the number of times a disease in a particular category ( occurs in an edge and therefore assumes one of the values . Similarly, and are additive functions assuming values . An alternative specification of ERGM that makes it analogous to logistic regression models is as follows: [ ( )] (8) In Equation (8), is the change in the value of the statistic that would occur if changed from 0 to 1 while leaving all the rest of (i.e., ) fixed. We make the dyadic independence assumption to overcome the degeneracy problem for large networks [36]. In the disease multimorbidity context, this means that we assume the presence/absence of a co-occurrence link between a disease pair to be independent of its neighborhood links which is a reasonable assumption. Therefore, our model can be alternatively represented as follows: [ ( )] ∑ ( ( ) ∑ ( ) ( ) (9) ( ) ) Our proposed explanatory model overcomes the challenge of non-convergence of Bernoulli ERGM, as it is dyadic independent while fitting a dense and large DCN. Furthermore, it is easily generalizable for future extensions such as adding additional variables (e.g., a node matching variable for capturing the presence of homophily) or including regularization terms for high-dimensional analysis (e.g., including n- dimensional embedding vectors for predictive modeling). Using the graph model shown by Equation (9), it is possible to identify the effect of any factor on the multimorbidity of diseases in a population which is represented by the DCN. The ERGM modeling approach and our model enables users to test significance of associative and causal relationships related to disease multimorbidity recorded in hospital visits data. ACM Trans. Manage. Inf. Syst. 3.4 Predictive modeling Similar to predictive modeling in tabular data, common classification and regression algorithms can be repurposed to perform edge/link prediction, or node prediction. Measures such as Adamic-Adar measure [37], similarity measures, community resource allocation [38], provide the propensity of pair of nodes to be connected by an edge based on the local neighborhood structure between them. Network features can be used for engineering input features to support tabular predictive modeling tasks. Features from network or node attributes are extracted to predict external outcome that are not inherent to the network structure. For example, patient visit costs have been shown to be tied to co-occurrence history of admitting diagnoses [9] and length of stay of a visit is shown to be related to multimorbidity of a patients reported diagnoses [6]. Graph embedding methods can be generating low-dimensional vectors for nodes and edges following which graph neural networks can be used for modeling outcomes of interest [39]. In this study, we focused on examining the temporal pattern of multimorbidity and hence did not develop a prediction model. However, future studies can refer to our framework for utilizing disease co-occurrence information for developing predictive analytics solutions. 4 ANALYSIS To demonstrate the application of our graph analytics framework and explanatory graph model, we consider the problem of analyzing disease multimorbidity patterns before and during the COVID-19 pandemic. In this section, we describe the data, perform exploratory analysis on DCNs trained over different time periods, and thereafter present results from our graph model. 4.1 Data We consider five 6-month time periods of hospital discharge records curated by the Arizona Department of Health Services (AZDHS) as public use files which can be requested via their website 1F . The time periods span from the first half of 2019 to the first half of 2021 and therefore contain the discharge records from the pre-pandemic (i.e., 2019), onset (first half of 2020) and peak (second half of 2020 and first half of 2021) periods. The five datasets are indicated to as 2019-01, 2019-02, 2020-01, 2020-02, 2021-01 hereafter. Up to 25 diagnoses are reported for a single patient visit. Each of the datasets contains more than 350,000 patient visits across more than 140 licensed hospitals spread across Arizona. All the diagnoses in the datasets are labeled using the ICD-10 nomenclature which has more than 70000 unique diagnoses codes as of 2021. Table 1 shows the visit count and prevalence of diseases from top four categories relevant to COVID-19 – respiratory system disorders, mental/behavioral disorders, circulatory system disorders, endocrine/nutritional/metabolic diseases along with counts for acute and chronic conditions during each time period. https://azdhs.gov/preparedness/public-health-statistics/hospital-discharge-data/index.php#data-release ACM Trans. Manage. Inf. Syst. Table 1: AZ hospitals discharge data summary Input Variables 2019-01 2019-02 2020-01 2020-02 2021-01 Total visit count 389,163 382,334 359,902 371,315 372,607 Total disease count 4,481,787 4,432,026 4,299,795 4,595,654 4,792,069 Endocrine/Nutritional/Metabolic Diseases 553,114 568,001 553,789 595,195 616,187 Circulatory System Disorders 549,490 523,965 516,084 515,420 552,674 Disease Categories Mental/Behavioral Disorders 307,055 309,316 295,437 309,737 319,150 Respiratory System Disorders 236,252 195,517 229,660 233,519 239,968 Chronic Acute conditions 1,496,527 1,469,982 1,433,713 1,550,081 1,546,755 Indicator Chronic conditions 1,984,613 1,965,856 1,866,686 1,925,186 2,029,010 4.2 DCN We trained DCN for each of the time period using edge weight given by Equation 2. Out of 70,0000 diagnose codes, the majority are rarely reported in patient visits. We ignored diagnoses that were reported less than 10 times in each time period. For explanatory modeling, the weighted DCN was transformed to a binary network using a model-based heuristic as follows. Assume a minimum cut-off value and train a binary-ERGM for a given time period. Measure the Akaike Information Criteria (AIC) for the model for . Thereafter, vary between a range (say [0.05, 0.7]) and plot AIC for corresponding models. We considered 0.05 as the minimum value to filter out chance co-occurrences. Beyond 0.7, we noticed that the network is extremely sparse and constant across time periods. Lastly, visualize the elbow curve of AIC scores against to choose an optimal threshold that is also contextually. Figure 2 shows an elbow curve with as the optimal edge weight threshold to discretize the network. Figure 2. Elbow curve to select optimal threshold for discretizing edge weights 4.3 Exploratory analysis and network comparison The node and edge count for the weighted as well as binary DCNs along with network summary statistics for different time periods is shown in Table 2. The node count for 2020-02 and 2021-01 is slightly higher, however, there is no significant trend visible across different network metrics. ACM Trans. Manage. Inf. Syst. Table 2: Network statistics of DCN before and during COVID-19 pandemic Weighted DCN Binary DCN Term Node Edge Clustering Node Edge Transitivity Density count count coefficient count count 2019-01 1480 14230 0.6823 0.5668 0.0130 1253 1197 2019-02 1464 14298 0.6750 0.5637 0.0134 1295 1255 2020-01 1475 14301 0.6670 0.5768 0.0132 1288 1301 2020-02 1519 14917 0.6626 0.5527 0.0129 1333 1257 2021-01 1561 15212 0.6623 0.5612 0.0125 1374 1256 To detect any visual patterns in the structure of the DCNs across time periods, we generate 128- dimensional node embeddings and visualize a 2-dimesional representing of the embedding vectors using T- SNE. Diseases are colored based on their disease categories and the plots are aligned by reordering the node embeddings in the ascending order of the diagnoses codes, as embeddings are unsupervised and therefore each time period plot may have a different orientation. Figure 3 shows the TSNE plots of the DCNs across time periods. Though we cannot make any confirmatory inferences from the visualizations alone, it can be observed that the nodes in the DCNs for 2020-01 and 2020-02 time periods appear more spread out when compared to the DCNs generated using discharge records of 2019, while 2021-01 seems to be less spread out, hinting it to be a recovery period as it is similar to DCNs corresponding to 2019 -01 and 2019-02. (c) (b) (a) (e) (d) Figure 3. TSNE plots of DCNs for time periods (a) 2019-01; (b) 2019-02; (c); 2020-01 (d) 2020-02; (e) 2021-01 Network comparison between 2019-01 and other periods are shown in Table 3. While DegreeDivergence between the baseline period and other periods do not vary significantly, higher NetLSD and NetSimile show that the DCN for 2019-01 is significantly different from DCNs trained over peak pandemic periods. This implies that though the network statistics listed in Table 2 as well as the degree ACM Trans. Manage. Inf. Syst. distributions do not differ significantly before and during the pandemic, improved comparison metrics such as NetLSD and NetSimile are able to identify latent differences between multimorbidity patterns of pre-pandemic and peak pandemic periods. Table 3: Difference between DCN before and during pandemic (baseline term is 2019-01) Metric 2019-02 2020-01 2020-02 2021-01 NetLSD 193.25 56.32 380.79 815.80 DegreeDivergence 0.0438 0.0506 0.0484 0.0470 NetSimile 0.6803 0.5019 1.6213 2.1069 4.4 Explanatory modeling The explanatory graph model can further delve into characterizing the global differences between DCNs across time periods earlier identified in the exploratory analysis section. The results of the graph models for each time period are shown in Table 4. The values indicate the odds of edge formation as a function of the respective input variables. All inputs are statistically significant in the models. The average goodness of fit for each parameter and AIC values across model demonstrate good model fit [22]. We conducted ablation analysis by adding and removing input variables. We did not observe any major deviations in model fit or coefficient values, indicating that there is no severe collinearity and estimates are robust. The models were fit using ergm package in R programming language with 10,000 MCMC draws. Table 4: ERGM coefficients – Determinants of multimorbidity before and during COVID-19 pandemic Average Input Variables 2019-01 2019-02 2020-01 2020-02 2021-01 goodness of fit Edges 0.00 0.00 0.00 0.00 0.00 0.95 Prevalence 1.00 1.00 1.00 1.00 1.00 0.88 Endocrine/Nutritional/ Metabolic Diseases 1.26 1.35 1.24 1.39 1.35 0.92 Circulatory System Disorders 1.31 1.30 1.20 1.31 1.25 0.94 Disease Categories Mental/Behavioral Disorders 0.76 0.77 0.89 1.02 0.85 0.98 Respiratory System Disorders 0.81 0.74 0.77 1.14 1.05 0.98 Chronic Acute conditions 1.09 1.10 1.15 0.98 0.98 0.95 Indicator Chronic conditions 0.77 0.81 0.77 0.78 0.79 0.91 AIC 17728 18643 19182 18756 18908 Model fit BIC 17820 18736 19275 18850 19002 For ease of comparison and making inferences, we plot the odds as a linear function of time for each input covariate in Figure 4. Out of the four disease categories, only mental disorders and respiratory disorders show a discernible pattern of increased multimorbidity during peak pandemic, while the other two categories show either a slight increase or no change. While the odds of multimorbidity of acute conditions decreased by around 10% during peak pandemic, there is no such pattern for chronic conditions. ACM Trans. Manage. Inf. Syst. Figure 4. Plot of multimorbidity odds over time for (a) Mental disorders; (b) Respiratory disorders; (c) Endocrine diseases; (d) Circulatory disorders; (e) Acute conditions; (f) Chronic conditions 5 DISCUSSION AND CONCLUSION Severe pandemics like COVID-19 may change the health and hospital system dramatically. Remedial policies and pre-emptive interventions seek timely inputs based on existing evidence. The enduring effects of the ongoing pandemic on the global healthcare systems is not unknown. Hospital discharge records are hard evidence of population-wide disease incidence patterns and may hold crucial clues regarding future trajectories of existing health problems. In this study, we focus on the phenomenon of multimorbidity, which is the simultaneous non-random occurrence of multiple diseases in patient visits. Multimorbidity has been shown to be a serious problem of modern healthcare and has been related to poor quality of life, psychological distress, lower life expectancy among other wellness indicators. We develop a graph analytics framework and an ERGM-based explanatory graph model to examine disease multimorbidity recorded in discharge records of hospitals before and during COVID-19 in Arizona U.S. Our framework and graph model are design artifacts that can be adopted for any standardized electronic health records database containing ordered or unordered lists of diagnoses codes. Our study informs future studies incorporating multimorbidity into problem scenarios such as disease risk prediction and feature engineering for modeling health outcomes (e.g., readmissions, treatment cost, mortality, etc.). Our explanatory graph model can also be used to complement deep learning and data mining modeling approaches as an interpretable surrogate model. Though we use six-month periods for training our model, it can be also trained on monthly or weekly curated datasets or moving windows, which may help to ACM Trans. Manage. Inf. Syst. realize near real-time monitoring of multimorbidity. Training and testing our model on more data will also help determine the enduring effects of the COVID-19 pandemic on the current healthcare systems. 5.1 Managerial implications Our study has multiple implications for health analytics researchers as well as policy makers. The framework and graph model proposed in this study equips analysts with guidelines for developing tools to analyze disease multimorbidity patterns. These tools in turn can help take pre-emptive actions for averting impending public health crises and syndemics. For example, to investigate enduring effects of the COVID- 19 pandemic on immuno-compromised patients, exploratory methods can be used to identify frequently co-occurring conditions among such patients as well as focal conditions leading to higher mortality. Thereafter, explanatory modeling can be used to investigate factors related to increasing multimorbidity and mortality among immuno-compromised patients, followed by predictive modeling for foreseeing future risks. Targeted case-management plans can then be developed for specific groups of immuno- compromised patients under increased risk of developing fatal conditions across hospitals. Similarly, multimorbidity related readmissions and mortality issues can be proactively addressed for other vulnerable and underserved communities. Furthermore, our framework highlights the importance of the explanatory modeling component in graph analytics, facilitating decision-making related to choose one or more of these approaches for analyzing multimorbidity data of populations. Finally, our framework provides new tools and mechanisms for explaining and presenting the multimorbidity phenomenon. In this way, it contributes towards enhancing transparency in big data technologies and data science in healthcare. Our graph model provides a template for using explanatory modeling to compare graph networks. While we apply it to study multimorbidity patterns in hospital discharge reporting, it can be generalized to other applications where it is valuable to generate and compare graphs across time periods and groups. Our study informs medical researchers about distinct patterns in multimorbidity across disease categories. Our analysis and inferences solicit the attention of decision makers towards distinctive multimorbidity patterns before and during COVID-19 pandemic. In concurrence with clinical findings that mental disorders have risen during the COVID-19 pandemic due to lockdowns and societal distress [40], we find that multimorbidity related to mental disorders has risen and peaked during 2020-02 with a 34.26% increase over pre-pandemic period. However, the mental disorders multimorbidity has reduced by 11.52% between 2019-01 and 2021-01 periods, indicating a tapering trend. Multimorbidity of respiratory disorders related diagnoses have increased during the peak pandemic period by 41.04% in 2020-02 and 29.86% during 2021-01 in comparison with pre-pandemic levels. It is important to note here that the ICD-10 codes for COVID-19 including U07.1 (Covid-19 confirmed) and Z20822 (Contact with and suspected exposure to COVID-19) belong to categories for special purposes and other health services respectively and not respiratory disorders per se. Lastly, multimorbidity in acute conditions show a slight downward trend of multimorbidity during 2020-02 and 2021-01. A possible reason could be the sudden rise in incidence of acute conditions (see Table 1), corroborating with news reports of higher mortality due to organ failures and biological system shutdown during the peak pandemic. Visits related to severe acute conditions may be related to shorter hospitalization time and therefore can be speculated to have lower number of (follow- up) comorbid conditions. Since the percentage change in odds is less than 10%, the multimorbidity odds are to be inspected over a longer period of time to identify a clinically relevant trend. Our findings inform the ACM Trans. Manage. Inf. Syst. policymakers to closely monitor aggregate cases of patients with respiratory and mental disorders over time as the multimorbidity related to these disease categories may have decreased post peak-pandemic period but are still above pre-pandemic levels. Patients suffering from conditions belonging to these categories may be more vulnerable to other health issues and may therefore be related to higher mortality and treatment costs. Special care should be taken towards such patients so that they are discharged in a timely manner with least follow-up complications. 5.2 Relevance to IS In the last decade, the number of design science studies focused on health IS has multiplied [5], [25], [41]. Several recent studies have focused on prediction modeling using electronic health records [25], [42], [43]. Few have explicitly included disease comorbidity in their problem formulation [5], [39], [43]. However, to the best of our knowledge, ours is one of the pioneering IS studies focusing on explaining disease multimorbidity through a graph model and characterizing temporal patterns in terms of its coefficients. Predictive modeling and statistical modeling in analytics go side-by-side as one predicts the future using existing data, focusing on informing us on the question What will be, while the other explicates hidden patterns and tells us about What is with respect to a phenomenon [44]. Both of them are important and require attention to optimize the utility of the generated data. While complex machine learning methods such as deep learning are invaluable for accurately predicting patient outcomes, explanatory models such as ours provide a statistical window to quantitatively explore a given phenomenon. Our contribution is timely in health IS research, as big data sources such as hospital discharge records are increasingly being used in design science studies. It provides a foundation for future IS research employing explanatory modeling using hospital discharge records. 5.3 Limitations and future research Our study has a few limitations and caveats. Since the focus of our study is primarily on pattern identification and not making predictions, we do not have benchmarks for prediction performance comparison and evaluation. However, using our framework and novel adaptation of the ERGM approach to disease network modeling, we are able to derive novel clinical insights related to multimorbidity. The small odds ratio of the edge term (i.e., intercept) for all periods is indicative of the corresponding low density of the DCNs. Though prevalence is a significant input, the odds is constant at 1 for each time period implying that prevalence has no effect on multimorbidity. This may be because our measure of edge weight deliberately normalizes the impact of prevalence on evidence of co-occurrence. Before COVID-19 was declared as a pandemic, the massive number of cases involving COVID-19 symptoms were initially identified as z20828, which stands for the contact with and exposure to other viral communicable diseases. Also, since early 2020-02, the u071 diagnosis code was introduced into the ICD-10 system for coding COVID-19. This among other reasons is why we chose not to directly model the ego network of COVID-19 to see the direct effects on its neighbors. With more data, these special categories that include COVID-19 related diagnoses can be included into the graph model to examine patterns. Future research can extend our graph model as well as include more time periods to ascertain enduring patterns. DCNs trained over monthly or weekly aggregates of hospital discharge data combined with external data sources such as social media and air quality can also open new areas of research for disaster management and preventive ACM Trans. Manage. Inf. Syst. care. Future research can examine plausible reasons and patterns related to the disproportionate increase in multimorbidity due to COVID-19 pandemic across different disease categories and conditions. REFERENCES [1] J. D. Birkmeyer, A. Barnato, N. Birkmeyer, R. Bessler, and J. Skinner, The impact of the COVI D-19 pandemic on hospital admissions in the United States, Health Aff, vol. 39, no. 11, 2020. [2] T. v. Giannouchos, J. Biskupiak, M. J. Moss, D. Brixner, E. Andreyeva, and B. Ukert, Trends in out patient emergency department visits during the COVID-19 pandemic at a large, urban, academic hospital system, American Journal of Emergency Medicine, vol. 40, 2021, doi: 10.1016/j.ajem.2020.12.009. [3] M. M. Jeffery et al., Trends in Emergency Department Vis its and Hospital Admissions in Health Care Systems in 5 States in the First Months of the COVID-19 Pandemic in the US, JAMA Intern Med, vol. 180, no. 10, pp. 1328–1333, Oct. 2020. [4] M. Zokaeinikoo, P. Kazemian, P. Mitra, and S. Kumara, AIDCOV: An Interpretable Artificial Intelligence Model for Detection of COVID-19 from Chest Radiography Images, ACM Trans Manag Inf Syst, vol. 12, no. 4, 2021, doi: 10.1101/2020.05.24.20111922. [5] Y.-K. K. Lin, H. Chen, R. A. Brown, S.-H. H. Li, and H.-J. J. Yang, Healthcare p redictive analytics for risk profiling in chronic care: A Bayesian multitask learning approach, MIS Quarterly, vol. 41, no. 2, pp. 473–495, 2017. [6] P. Kalgotra and R. Sharda, When will I get out of the h ospital? Modeling Length of Stay using Comorbidity Ne tworks, Journal of Management Information Systems, vol. Forthcomin, 2022. [7] C. A. Hidalgo, N. Blumm, A. L. Barabási, and N. A. Christakis, A Dynamic Network Approach for the Study of Hum an Phenotypes, PLoS Comput Biol, vol. 5, no. 4, Apr. 2009. [8] E. Choi, C. Xiao, J. Sun, and W. F. Stewart, Mime: Multilevel medica l embedding of electronic health records for predictive healthcare , 2018. [9] K. Srinivasan, F. Currim, and S. Ram, Predicting High -Cost Patients at Point of Admission Using Network Science., IEEE J Biomed Health Inform, vol. 22, no. 6, pp. 1970–1977, 2018, doi: 10.1109/JBHI.2017.2783049. [10] A. Prados-Torres, A. Calderón-Larrañaga, J. Hancco-Saavedra, B. Poblador-Plou, and M. van den Akker, Multimorbidity patterns: A systematic review, Journal of Clinical Epidemiology, vol. 67, no. 3. 2014. doi: 10.1016/j.jclinepi.2013.09.021. [11] B. Kim, K. Srinivasan, and S. Ram, Robust local explanati ons for healthcare predictive analytics: An application to fragilit y fracture risk modeling, 2020. [12] R. Amarasingham, R. E. Patzer, M. Huesch, N. Q. Nguyen, and B. Xie, Implementi ng electronic health care predictive analytics: considerations and challenges., Health Aff (Millwood), vol. 33, no. 7, pp. 1148–54, Jul. 2014, doi: 10.1377/hlthaff.2014.0352. [13] P. Kalgotra, R. Sharda, and J. M. Croff, E xamining mul timorbidity differences across racial groups: a network analysis of el ectronic medical records, Sci Rep, vol. 10, no. 1, 2020, doi: 10.1038/s41598-020-70470-8. [14] K.-I. Goh, M. E. Cusick, D. Valle, B. Childs, M. Vidal, and A.-L. Barabási, The human disease network., Proc Natl Acad Sci U S A, vol. 104, no. 21, pp. 8685–8690, 2007, doi: 10.1073/pnas.0701361104. [15] C. Zheng and R. Xu, Large -scale mining disease comorbidity relationships from post-market drug adverse events surveillance data, BMC Bioinformatics, vol. 19, 2018, doi: 10.1186/s12859-018-2468-8. [16] K. Steinhaeuser and N. v. Chawla, A Network -Based Approach to Understanding and Predicting Diseases, in Social Computing and Behavioral Modeling, Springer US, 2009, pp. 1–8. [17] M. E. Charlson, P. Pompei, K. L. Ales, and C. R. MacKenzie, A new method of classifying prognostic comorbidity in longitudin al studies: Development and validation, J Chronic Dis, vol. 40, no. 5, pp. 373–383, 1987. [18] R. Kost, B. Littenberg, E. S. Chen, T. Science, F. Allen, and H. Care, Ex ploring Generalized Association Rule Mining for Disease Co -Occurrences Department of Computer Science , 2 Department of Medicine , 3 Center for Clinical and, AMIA Annual Symposium Proceedings, vol. 2012, pp. 1284–1293, 2012. [19] P. Klimek, A. Kautzky-Willer, A. Chmiel, I. Schiller-Frühwirth, and S. Thurner, Quantification of Diabetes Comorbidity Risks across Life Using Nation-Wide Big Claims Data, PLoS Comput Biol, vol. 11, no. 4, pp. 1–16, 2015. [20] E. Choi, M. T. Bahadori, A. Schuetz, W. F. Stewart, and J. Sun, Doctor AI: Predicting Clinical Events via Recurrent Neural Networ ks, in Machine Learning for Healthcare, 2016, vol. 56, pp. 1–12. [21] Y. Cui, F. Ahmed, Z. Sha, L. Wang, Y. Fu, and W. Chen, A wei ghted network modeling approach for analyzing product competitio n, in Proceedings of the ASME Design Engineering Technical Conference, 2020, vol. 11A-2020. [22] D. R. Hunter, S. M. Goodreau, and M. S. Handcock, Goodness of fit of social network models, J Am Stat Assoc, vol. 103, no. 481, 2008, doi: 10.1198/016214507000000446. [23] D. R. Hunter and M. S. Handcock, Inference in curved exp onential family models for networks, Journal of Computational and Graphical Statistics, vol. 15, no. 3. 2006. [24] J. Dell’Italia, M. A. Johnson, P. M. Vespa, and M. M. Monti, Network analysis in disorders of consciousness: Four problems a nd one proposed solution (exponential random graph models), Front Neurol, vol. 9, no. JUN, 2018, doi: 10.3389/fneur.2018.00439. [25] I. Bardhan, J. Oh, Z. Zheng, and K. Kirksey, Predictive A nalytics for Readmission of Patients with Congestive Heart Failure, Information Systems Research, vol. 26, no. 1, pp. 19–39, 2015. ACM Trans. Manage. Inf. Syst. [26] ADHS - Home. https://ww w.azdhs.gov/covid19/data/index.php (accessed Jan. 06, 2022). [27] D. Koutra, N. Shah, J. T. Vogelstein, B. Gallagher, and C. Faloutsos, DELTACON: Principled massive -graph similarity function with attribution, ACM Trans Knowl Discov Data, vol. 10, no. 3, 2016, doi: 10.1145/2824443. [28] M. Berlingerio, D. Koutra, T. Eliassi-Rad, and C. Faloutsos, NetSimile: A Scalable Approach to Size-Independent Network Similarity, Sep. 2012, Accessed: Dec. 29, 2021. [Online]. Available: https://arxiv.org/abs/1209.2684v1 [29] R. Albert and A. L. Barabási, Statistical mechanics of complex networks, Rev Mod Phys, vol. 74, no. 1, 2002. [30] E. D. Kolaczyk and G. Csardi, Statistical Analysis of Network Data with R, vol. 65. 2020. doi: 10.1007/978-3-030-44129-6_2. [31] F. E. Harrell, Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis. New York, 2001. [32] N. D. Monnig and F. G. Meyer, The Resistance Perturbation D istance: A Metric for the Analysis of Dynamic Networks, Discrete Appl Math (1979), vol. 236, pp. 347–386, May 2016, doi: 10.1016/j.dam.2017.10.007. [33] A. Tsitsulin, D. Mottin, P. Karras, A. Bronstein, and E. Müller, NetLSD: Hearing the shape of a graph, 2018. doi: 10.1145/3 219819.3219991. [34] E. D. Kolaczyk and G. Csardi, Statistical Analysis of Network Data with R, vol. 65. 2020. [35] J. van der Pol, Introduction to Network Modeling Using Exponential Random Graph Models (ERGM), Comput Econ, vol. 54, no. 3, 2019. [36] D. Lusher, J. Koskinen, and G. Robins, Exponential Random Graph Models for Social Networks: Theory, Methods, and Applications. Cambridge University Press, 2013. [37] L. A. Adamic and E. Adar, Friends and neighbors on the Web, vol. 25, pp. 211– 230, 2003, doi: 10.1016/S0378-8733(03)00009-1. [38] T. Zhou, L. Lü, and Y. C. Zhang, Predicting missing links via local information, European Physical Journal B, 2009, doi: 10.1140/epjb/e2009- 00335-8. [39] W. Wang, X. Luo, B. Li, and H. Wang, Nudge to Refill? Model ing Consumer Health Risk with Graph Convolutional Networks for On line Pharmaceutical Targeting, Dec. 2021. [40] P. Winkler et al., Increase in prevalence of current mental disorders in t he context of COVID-19: analysis of repeated nationwide cross-sectional surveys, Epidemiol Psychiatr Sci, vol. 29, 2020, doi: 10.1017/S2045796020000888. [41] X. Liu, B. Zhang, A. Susarla, and R. Padman, Go to You Tube and See Me Tomorrow: The Role of Social Media in Managing Chronic Conditions, MIS Quarterly, 2019. [42] J. Xie, B. Zhang, J. Ma, D. Zeng, and J. Lo-Ciganic, Readmission Prediction for Patients with H eterogeneous Medical History: A Trajectory-Based Deep Learning Approach, ACM Trans Manag Inf Syst, vol. 13, no. 2, pp. 1–27, Oct. 2022, doi: 10.1145/3468780. [43] L. Qiu, V. Rajan, B. C. Y Tan, S. Gorantla, and B. C. Y, Multi -disease Predictive Analytics: A Clinical Knowledge-aware Approach, ACM Trans Manag Inf Syst, vol. 12, no. 3, 2021, doi: 10.1145/3447942. [44] G. Shmueli, To Explain or to Predict?, Statistical science, vol. 25, no. 3, pp. 289–310, 2010. ACM Trans. Manage. Inf. Syst.
ACM Transactions on Management Information Systems (TMIS) – Association for Computing Machinery
Published: Jan 25, 2023
Keywords: Graph analytics
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.