Longitudinal Data Analysis Based on Bayesian Semiparametric Method
Longitudinal Data Analysis Based on Bayesian Semiparametric Method
Jiao, Guimei;Liang, Jiajuan;Wang, Fanjuan;Chen, Xiaoli;Chen, Shaokang;Li, Hao;Jin, Jing;Cai, Jiali;Zhang, Fangjie
2023-04-27 00:00:00
axioms Article Longitudinal Data Analysis Based on Bayesian Semiparametric Method 1, ,† 2,3, ,† 1, † 1, † 1 1 1 Guimei Jiao * , Jiajuan Liang * , Fanjuan Wang , Xiaoli Chen , Shaokang Chen , Hao Li , Jing Jin , 1 1 Jiali Cai and Fangjie Zhang School of Mathematics and Statistics, Lanzhou University, Lanzhou 730000, China; wangfj2021@lzu.edu.cn (F.W.); 18393919665@163.com (X.C.); chenshk21@lzu.edu.cn (S.C.); haoli21@lzu.edu.cn (H.L.); jinj21@lzu.edu.cn (J.J.); caijl21@lzu.edu.cn (J.C.); zhangfj21@lzu.edu.cn (F.Z.) Department of Statistics and Data Science, BNU-HKBU United International College, Zhuhai 519087, China Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College, Zhuhai 519087, China * Correspondence: gmjiao@lzu.edu.cn (G.J.); jiajuanliang@uic.edu.cn (J.L.) † These authors contributed equally to this work. Abstract: A Bayesian semiparametric model framework is proposed to analyze multivariate longitu- dinal data. The new framework leads to simple explicit posterior distributions of model parameters. It results in easy implementation of the MCMC algorithm for estimation of model parameters and demonstrates fast convergence. The proposed model framework associated with the MCMC al- gorithm is validated by four covariance structures and a real-life dataset. A simple Monte Carlo study of the model under four covariance structures and an analysis of the real dataset show that the new model framework and its associated Bayesian posterior inferential method through the MCMC algorithm perform fairly well in the sense of easy implementation, fast convergence, and smaller root mean square errors compared with the same model without the specified autoregression structure. Keywords: Bayesian semiparametric method; covariance structure; Dirichlet process; linear mixed model; longitudinal data; MCMC algorithm MSC: 62F10; 62F15; 62F40; 65C40 Citation: Jiao, G.; Liang, J.; Wang, F.; Chen, X.; Chen, S.; Li, H.; Jin, J.; Cai, J.; Zhang, F. Longitudinal Data 1. Introduction Analysis Based on Bayesian Semiparametric Method. Axioms Longitudinal data arise from repeated observations from the same individual or group 2023, 12, 431. https://doi.org/ of individuals at different time points. The structure of longitudinal data is shown in the 10.3390/axioms12050431 following Table 1. The basic tasks of analyzing longitudinal data can be summarized as (1) studying the trend in the covariance structure of the observed variables with respect Academic Editor: J. Alberto Conejero to time; (2) discovering the influence of covariates on the observable variables; and (3) Received: 31 December 2022 determining the within-group correlation structures [1]. Revised: 23 April 2023 Accepted: 24 April 2023 Table 1. Longitudinal data structures. Published: 27 April 2023 Number of Observations Observation Object 1 k 1 t , x , . . . , x , y t , x , . . . , x , y 11 1,11 p,11 11 1k 1,1k p,1k 1k Copyright: © 2023 by the authors. . . . . . . Licensee MDPI, Basel, Switzerland. . . . This article is an open access article i t , x , . . . , x , y . . . t , x , . . . , x , y i1 1,i1 p,i1 i1 ik 1,ik p,ik ik distributed under the terms and . . . . . . . . . conditions of the Creative Commons n t , x , . . . , x , y . . . t , x , . . . , x , y n1 1,n1 p,n1 n1 nk 1,nk p,nk nk Attribution (CC BY) license (https:// Note: In Table 1, t stands for the j-th observation time of the individual i; (x , . . . , x ) for the p-dimensional i j 1,i j p,i j creativecommons.org/licenses/by/ covariate of individual i at t ; and y for the j-th observation of individual i. i j i j 4.0/). Axioms 2023, 12, 431. https://doi.org/10.3390/axioms12050431 https://www.mdpi.com/journal/axioms Axioms 2023, 12, 431 2 of 37 Longitudinal data are often highly unbalanced. It is usually difficult to apply tradi- tional multiple regression techniques to analyze highly unbalanced data directly. Statisti- cians developed various Bayesian statistical inference models for longitudinal data analy- sis [2]. A parametric assumption may result in modeling bias and may relax the assumption about parametric structure. Nonparametric methods have the characteristics of robustness because they do not require model assumptions. Semiparametric models integrate the characteristics of parametric and nonparametric models and have the characteristics of flexibility and ease of interpretation. Nonparametric statistical methods and semiparamet- ric statistical methods are not only hot spots in current statistical research but also widely used in many practical applications. Xiang et al. [3] summarize some common outcomes of nonparametric regression analysis of longitudinal data. Bayesian methods for paramet- ric linear mixed models have been widely used in different areas. Assuming a normal random effect and using the standard Gibbs sampling to realize some simple posterior inference, Quintana et al. [4] extend the general class of mixed models for longitudinal data by generalizing the GP part to the nonparametric case. The GP is a probabilistic approach to learning nonparametric models. Cheng et al. [5] propose the so-called LonGP, which is a flexible and interpretable nonparametric modeling framework. It provides a versatile software implementation that can solve commonly faced challenges in longitudinal data analysis. It also develops a fully Bayesian, predictive inference for LonGP, which can be employed to carry out model selection. Kleinman and Ibrahim [6] relax the normal assumption by assuming a Dirichlet process prior with a Gaussian measure of zero mean, semiparametrically modeling the random effects distribution [7]. Although a parametric model may have limitations in some applications, it is simpler than a nonparametric model whose scope may be too wide to draw a concise conclusion. Nonparametric regression has a fatal weakness, known as the “curse of dimensionality”, which refers to the fact that when the independent variable X is multivariate, the estimation accuracy of the regression function becomes very poor as the dimension of X increases. To solve this problem, the semiparametric model is a good compromise maintaining some excellent characteristics of parametric and nonparametric models [8]. There is a lot of literature on the semiparametric modeling of longitudinal data. Most of the existing literature employs random effects to model within-group correlations [9]. Semiparametric linear mixed models generalize traditional linear mixed models by modeling a covariate effect with a nonparametric function and parametrically modeling other covariate effects. Semiparametric linear mixed models mainly use frequentist meth- ods for statistical inference and assume normal random effects [10]. In this paper, we employ a Bayesian semiparametric framework for linear mixed models given by Quin- tana et al. [4] by imposing stronger conditions on the random effect to obtain an explicit solution to posterior distributions of the model parameters and fast convergence of the MCMC algorithm in the posterior inference of model parameters. The proposed framework generalizes the default priori of variance components and adjusts the inference of fixed effects associated with nonparametric random effects. The latter includes the extrapolation of nonparametric mean functions over time [11,12]. The stochastic process approach in [4] is a good choice for characterizing the intragroup correlation. The Gaussian process (GP) covariance has an exponential form and is uniquely determined by two parameters. GP can specify autoregressive correlation (the AR covariance structure, [13]). A nonparametric Dirichlet process (DP) priori assigned to the covariance parameter results in an Ornstein– Uhlenbeck process (OUP). A partial Dirichlet process mixture (DPM) is performed on the OUP. By imposing stronger conditions on the random effect and decomposing the OUP into some single Gaussian variables, we can substantially simplify the MCMC sampling in the posterior inference. The framework of our proposed semiparametric autoregression model (SPAR) is demonstrated in Figure 1. This paper is organized as follows. Section 2 briefly introduces some basic theoretical background and principles. Section 3 introduces the model setup using a partial Dirichlet process mixture in terms of the OUP, and the Bayesian semiparametric autoregressive model Axioms 2023, 12, 431 3 of 37 is proposed with a recommended solution. In Section 4, the Dirichlet process and Dirichlet process mixture (DPM) are simply introduced. The formulation of the semiparametric autoregressive model is given in Section 5. Section 6 derives the marginal likelihood, prior determination, and posteriori inference. Section 7 gives a simple Monte Carlo study and a real dataset application based on the proposed model. Some concluding remarks are given in the last section. Figure 1. Bayesian semiparametric autoregression model (SPAR model). 2. Theoretical Basis 2.1. A General Linear Hybrid Model Containing an AR Structure For an observation object i (i = 1, 2, . . . , n), we denote the observation time points by t , t , . . . , t . Let y = (y , y , . . . , y ) (n 1) be the observation vector from i1 i2 in i1 i2 in i i i i object i. At time point t , we consider the possible time-dependent covariate vector i j 0 0 x = (1, x (t ), x (t ), . . . , x (t )). Let y = (y , y , . . . , y ) (n 1) be the observa- i1 i j i2 i j i p i j i1 i2 in i i j i i tion vector from object i. At time point t , we consider the possible time-dependent i j 0 0 covariate vector x = (1, x (t ), x (t ), . . . , x (t )). Let E(y ) = x b. Define the i1 i j i2 i j i p i j i j i j i j n ( p + 1)-dimensional fixed-effect design matrix X = (x , x , . . . , x ) . Assume i i i1 i2 in E(y ) = X b. Define the corresponding n q(q p)-dimensional random effects design i i 0 0 matrix Z = (z , z , . . . , z ) , where z = (1, z (t ), z (t ), . . . , z (t )). The general i i1 i2 in i1 i j i2 i j iq i j i i j linear mixed model containing an AR covariance structure is S jq N(0, C (q)), y = X b + Z b + S + e , i i i i i i i (1) b jt N(0, D(t)), e N(0, s I ), i n where X b stands for the fixed effect and Z b for the random effect, S is a stochastic i i i i process characterizing the correlation among the n observations from object i, e is the error i i term, i = 1, 2, . . . , n . C (q) is an n n structural covariance matrix, and vectors t and i i i i Axioms 2023, 12, 431 4 of 37 q contain the variance–covariance parameters of the random vector b and the stochastic process S , respectively. I stands for the identity matrix of dimension n n . The AR i i i structure is usually specified by a GP with zero mean. The GP is uniquely determined by a covariance function containing parameter q = (s , r) . The vector S is a GP corresponding s i to the i-th object. S is generated sequentially by the GP fs (t) : t > 0g at the observation i i time points ft , t , . . . , t g, i.e., i1 i2 in S = s (t ), s (t ), . . . , s (t ) . i i i1 i i2 i in To specify the AR structure, it is assumed that the above GP possesses stationarity: Cov(s (t ), s (t )) = s r(jt t j). (2) i il i ik s il ik When the model has a structured covariance function, the covariance matrix of y is 0 2 Cov(y ) = Z D(t)Z + C (q) + s I . (3) i i i i i 2.2. The Principle for Bayesian Inference The Bayesian method assumes a piori distribution on the unknown parameter q and a joint distribution p(x, q) between an observable variable X and the parameter q. The Bayesian method is based on the posterior distribution p(qjx) of the unknown parameter q after observed data are available. The joint distribution, the priori distribution, and the posterior distribution are related to each other as follows: p(x, q) = p(qjx) p (x), Z Z p (x) = p(x, q)dq = p(xjq)p(q)dq, Q Q where p (x) stands for the sample marginal density of x. We use the posterior distribution p(qjx) to carry out statistical inference for the parameter q: p(x, q) p(xjq)p(q) p(qjx) = = (4) p (x) p(xjq)p(q)dq Equation (4) is the well-known Bayesian formula. Bayesian statistical inference assumes that the posterior distribution of the unknown parameter q, p(qjx) contains all the available information. As a result, the point estimation, interval estimation, hypothesis testing, and predicting inference of q can be implemented as usual. Because p(q) p(xjq) p(qjx) = µ p(q) p(xjq), p(q) p(xjq)dq we construct the Bayesian estimate for q after observing data x by their conditional expecta- tion: p(q) p(xjq)qdq q = E(qjx) = µ p(q) p(xjq)qdq. (5) p(q) p(xjq)dq In general, the Bayesian estimate g ˆ(q) for a function of q, g(q) can be obtained as follows: p(q) p(xjq)g(q)dq g(q) = E[g(q)jx] = µ p(q) p(xjq)g(q)dq. p(q) p(xjq)dq Q Obviously, the estimator g ˆ(q) is the usual expectation of g(q) with respect to the posterior distribution p(qjx): g ˆ(q) = g(q)p(qjx)dq = E g(q)jx . (6) [ ] Q Axioms 2023, 12, 431 5 of 37 2.3. The MCMC Sampling and Its Convergence When the posterior distribution p(qjx) in (6) is difficult to compute, estimating multi- ple parameters given by g(q) can be realized by the Markov Chain Monte Carlo (MCMC) method [14] following these steps: (1) establish a Markov chain, the stationary distribution for p(qjx); (2) use the Markov chain for the posterior distribution p(qjx) to carry out sampling to obtain an MCMC sample fq , q , . . . , q g; and (3) obtain the MCMC estimator 0 1 n g ˆ(q) for g(q) by g ˆ(q) = g ¯(q) = g(q ). å i i=1 The MCMC sample can estimate the parameter function g(q) more and more effec- tively when the sample size n becomes larger and larger. The MCMC samplefq , q , q , . . . ,g 0 1 2 possesses some properties, such as stability, normal recurrence, periodicity, and irreducibil- ity. The choice of the initial value q has little impact on the estimate of q . Gibbs sampling is 0 t one of the MCMC algorithms. It is used to construct a multivariate probability distribution of a random sample. The defect of the standard Gibbs sampling is that it cannot process the nonconjugated distribution. Because the prior distribution of parameter r in the model in this paper is nonconjugated, the improved version of the Gibbs algorithm is adopted [15]. With regard to the convergence of the MCMC sampling, we consider three convergence criteria: (1) Combine the posterior sampling trend graph with the energy graph of the MCMC sampling process for convergence assessment. By observing the posterior sampling trend graph of a random variable, we can determine if the information of the sampling tends to be stable as the number of iterations increases. (2) Compare the overall distribution of energy levels in real data with energy changes between successive samples. If the two distributions are similar to each other, we can conclude that the algorithm converges. (3) Draw the trajectory of the negative evidence lower bound (ELBO, [16]) obtained in the optimization process to verify the convergence of the algorithm. Minimizing the KL (Kullback–Leibler) divergence is equivalent to minimizing the negative ELBO [17,18]. 3. The OU Process The random process S in the general linear mixed model (1) is a Gaussian process (GP). A GP can be regarded as an infinite dimensional extension of the multivariate Gaus- sian distribution, and its probability characteristics are uniquely determined by the mean function and covariance function. In particular, a GP with zero mean is completely de- termined by its covariance function [18]. To give a simple review of GP and keep the notation S specially used in model (1), we return to a general notation X(t) for GP to avoid a mix-up with the model parameter in model (1). X(t) in this section can be con- sidered as a copy of S in model (1). If random process fX(t), t 2 Tg is a GP, any finite observation vector x = X(t ), . . . , X(t ) has a multivariate Gaussian distribution. Let 1 n E[X(t)] = m (t), var[X(t)] = s (t). The joint density function of the multivariate Gaus- sian random vector is n 1 0 1 f (x(t ), . . . x(t )) = (2p) jC j exp (x m ) C (x m ) , 1 n X X X where x = x(t ), . . . , x(t ) , mean vector m = (m (t ), . . . , m (t )) , and C stands for n n 1 X X 1 X X h i the covariance matrix. Let c = cov X(t ), X(t ) = E (X(t ) m (t ))(X(t ) m (t )) . i j i j i X i j X j When the Gaussian process fX = X(t), t 2 Tg has a zero mean and it is smooth, we have c = E[X(t )X(t )] and i j i j Axioms 2023, 12, 431 6 of 37 E[X(t)] = 0 var[X(t)] = s R (X , X ) = E[(X )(X )] = R (t s), t > s, t s t s X X C (X , X ) = s R (t s) = C (t s), t > s, t s X X X where R (X , X ) and C (X , X ) stand for the GP autocorrelation function and the au- X t s X t s tocovariance function, respectively, which only depend on the time interval t s. Thus, R (t s) = r(jt sj), and c = cov X(t ), X t = s r(jt t j). The correlation coeffi- i j i j i j jt t j i j cient between X(t ) and X(t ) is denoted by r(jt t j). It is assumed that r(jt t j) = r i j i j i j in this paper. A zero-mean smooth GP fX(t), t 2 Tg is an Ornstein–Uhlenbeck (OU) pro- cess [19]. An OU process can be regarded as a continuous analogy of the discrete-time first-order autoregressive AR(1) process. To understand some properties of the AR(1) process, we express the AR(1) process as X = f X + e , t 1, X = e , (7) t 1 t 1 t 0 0 where jf j < 1 is a weight parameter to ensure stability, and e , e , e , . . . are uncorrelated 1 0 1 2 random variables satisfying , t = 0, 1 f E(e ) = 0, t 0, Var(e ) = 1 . t t : 2 s , t 1. It is assumed that the random error satisfies cov(e , e ) = 0, k 6= 0. Therefore, t t+k t t+k t i t+k i cov(X , X ) = cov f e , f e t+k å i å i i=0 i=0 t i t+k i = f f cov(e , e ) å i i i=0 t 2 k s f 2 2t+k 2i 1 = s f + f = . 1 2 1 2 1 f 1 f 1 i=1 1 The autocorrelation function r is assumed to follow the first-order difference equation r = f r , k 1, r = 1, 1 0 k k 1 so that we have the following solution: r = f , k 0. As shown in Figure 2 below, when jf j < 1, the absolute correlation between X(t ) and X(t ) of X(t) at two different time points t and t approaches 0 when the time is increasing. j i j The above AR(1) process (7) is a special case of an OU process by taking f = 1 q and m = 0, that is, X = X + q(m X ) + e , X = e . t t t+1 t+1 0 0 Therefore, the exponential covariance matrix constructed by the OU process can be used to describe the correlation in longitudinal data. An OU process is showed in Figure 3. It is obvious that when X(t) moves toward its mean position with the increase in time t, the correlation between X(t ) and X(t ) at two different time points becomes weaker i j and weaker. Axioms 2023, 12, 431 7 of 37 Figure 2. Example of an AR(1) process. Figure 3. Example of an OU process. 4. Dirichlet Process and Dirichlet Process Mixture 4.1. Dirichlet Process A Dirichlet process (DP) is a class of stochastic processes whose trace is a probability distribution. DP is often used in Bayesian inference to describe the prior knowledge of random variables. Definition 1. Given a measurable set C , a basis distribution G , and a positive real number a, a Dirichlet process DP(G , a) is a random process whose realization is a probability distribution on C . For any measurable finite partition of C , B , if G DP(G , a), then f g i 0 i=1 (G(B ), . . . , G(B )) Dir(aG (B ), . . . , aG (B )). 1 n 0 1 0 n Axioms 2023, 12, 431 8 of 37 where Dir stands for the Dirichlet distribution. A DP is specified by the basis distribution G and the concentration parameter a. A DP can also be regarded as an infinite dimensional generalization of the n-dimensional Dirichlet distribution. A DP is the conjugate priori of an infinite, nonparametric discrete distribution. An important application of DP is to use it as a prior distribution for infinite mixture models. A statistically equivalent description of DP is based on the stick-breaking process, described as follows. Given a discrete distribution G(r) = w d (r) å k r i=1 where d is an indicator function, namely rk 1, q = r ; d = 0, q 6= r . iid iid k 1 r G , w = v (1 v ), v Beta(1, a), the probability distribution G defined in k 0 k k i i i=1 this way is said to obey the Dirichlet process, denoted as GDP(G , a). A DP can be constructed by a stochastic process. It possesses some advantages in its application in Bayesian model evaluation, density estimation, and mixed model clustering [20]. 4.2. Dirichlet Process Mixture We consider a set of i.i.d. sample data fy , . . . , y g as a part of an infinitely exchange- able sequence. y follows the probability distribution F(y; q) with parameter q 2 Q, that is, y jq F(y; q). It may be assumed that the prior distribution of q is an unknown random probability measure G, and G can be constructed by DP, that is, qjG G, G DP(G , a). Thus, a Dirichlet process can be obtained using the hybrid (DPM) model definition: y jq F(y; q), qjG G, G DP(G , a), i 0 where G and a are the basis distribution and model parameter, respectively. In general, the distributions F and G will depend on additional hyperparameters not mentioned above. Since the trace of DP is discretized with probability 1, the above model can be regarded as a countably infinite mixture. We can integrate the distribution G in the above DPM to obtain the prior distribution of q: i 1 1 a q j(q , . . . , q ) d + G i 1 i 1 å q 0 a + i 1 a + i 1 j=1 i 1 i 1 1 a = d + G , å q 0 a + i 1 i 1 a + i 1 j=1 where d (j = 1, . . . , i 1, i = 2, 3, . . .) represents the point quality at q . This is a mixture q j of two distributions with weights p = (i 1)/(a + i 1) and 1 p = a/(a + i 1). n o Let f (jq) : q 2 Q < be a family of density functions with parameter q. We assume that the density function of the observed data y follows the probability distribution family F = f f (jG) : G 2 Pg defined by f (y jG) = f (y jq)dG(q), (8) i i which is called the DPM density of y , where G(q) is the distribution of q. F is the nonpara- metric family of mixed distributions. The distribution G can be made random by assuming that G comes from a DP. In practice, to obtain the DPM density functions of y , . . . , y , it 1 n Axioms 2023, 12, 431 9 of 37 is necessary to introduce the latent variable q related to y . That is, after introducing q , i i i the joint density function of fy : i = 1, . . . , ng is f (y jq ), where fq : i = 1, . . . , ng is i i i i i=1 a sample from the distribution G. The joint density function of fy : i = 1, . . . , ng can be expressed as f (y jG). We assign the prior distribution DP(G , a) to the distribution G i 0 i=1 with G DP(G , a). The Bayesian model is set up as follows: k 1 y j(b, S , b , q, s) F(q, b, S , b , s), i i i i i w = n (1 n ), k k Õ i i=1 q jG G, k = 1, 2, . . . (9) iid n Beta(1, a), G() = w d (), å k q iid k=1 q G , k = 1, 2, . . . k 0 wherefy , . . . , y g is a set of observations andfq , . . . , q g is a set of latent variables. Based 1 n 1 n on the discretization of DP, we can obtain the DPM density function of S by p(S jG) = p(S jq)G(dq) = w p(S jq ). i i å k i k k=1 A semiparametric model can be set up by replacing the DP mixture on q with the DP mixture on any subset of w = fw : k 1g (given by (9)), depending on q. Let q = (w , h) k i i with a prior distribution h. The above DP mixture is only performed on the parameter w . The joint density function f (y jh, G) of fy : i = 1, . . . , ng can be obtained. The prior i i i=1 distribution of h and G are prespecified. As a result, the semiparametric model can be determined [21]. 5. Formulation of the Semiparametric Autoregressive Model 5.1. The Partial Dirichlet Process Mixture of Stochastic Process The stochastic process in (1) is semiparameterized to generalize the general linear mixed model for longitudinal data. A nonparametric DP prior is assigned to the covariance parameter (s , r) of S . To reduce the number of unknown parameters, a partial Dirichlet s i process mixture is performed on S , i.e., only a nonparametric DP prior is assigned to the parameter r. The semiparametric process is created as follows. First, we consider the OU process S associated with object i to have a covariance matrix C (q) = s C (r), where i i i 2 jt t j ˜ il ik ˜ q = (s , r). The (k, l)-element of the matrix C (r) is given by r . The matrix C (r) s i i associated with the random process S has the following form: 2 3 t t in i1 jt t j jt t j i2 i1 i3 i1 i 1 r r r 6 7 6 7 t t in i2 jt t j jt t j 6 i1 i2 i3 i2 i 7 r 1 r r 6 7 ; (10) 6 7 . . . . . . . . 6 7 . . . . 4 5 t t t t t t i1 in i2 in i3 in i i i r r r 1 Second, we give the parameter r a nonparametric DP prior: rjG G, G DP(G , a), (11) where G is a known probability distribution, a is a constant, and the probability distribu- tion G is generated by DP(G , a) satisfying G() = w d (), å k k iid n Beta(1, a), k=1 i (12) k 1 iid f G , k 0 w = n (1 n ), k k Õ i i=1 Axioms 2023, 12, 431 10 of 37 where d () is the point quality of f, and G is a random probability distribution. Using the discretization of DP, for any parameter f, we assume that f (jf) is a density function depending on the parameter f. Let fjG G, G DP(G , a). The DPM density function can be obtained: p(jG) = p(jf)G(df) = w p(jf ), (13) å k k k=1 iid where f G . After embedding the above conditional prior into the distribution of the k 0 random process S , we formulate the DPM model for S as i i 2 2 S j(s , r) N(0, s C (r)), rjG G, G DP(G , a). (14) i i 0 s s Using the discretization of the distribution function G, we obtain the DPM density of the random process S as follows. 2 2 2 ˜ ˜ p(S js , G) = N(S j0, s C (r))dG(r) = w N S j0, s C (r ) . (15) i s i s i å k i s i k k=1 This is an infinitely countable mixture of multivariate Gaussian density functions, where iid r ˜ G , N(j0, s C (r)) represents a multivariate Gaussian density function with a zero- 0 i k s mean vector and a covariance matrix of s C (r). It can be seen that after the semiparametric s i treatment of the stochastic process S , its distribution is a mixture of stochastic processes, which is a more general mixture of OU processes. Since G is discretized with probability 1, it provides an automatic clustering effect on the autocorrelation structure of the objects. After the OU process S is semiparameterized, the covariance between any two time points t , t can be obtained by il ik 2 2 cov(s (t ), s (t )js , G) = cov(s (t ), s (t )js , r) dG(r) i i i i il ik s il ik s Z (16) jt t j 2 jt t j 2 il ik il ik = s r dG(r) = w s r . s å k s k=1 After semiparameterization of the OU process S , it not only contains an AR structure but also has an automatic clustering effect. If the observations fy : i = 1, . . . , ng from model (1) are obtained at equal interval time points, the covariance matrix structure of y becomes AR(1). 5.2. The Framework of a Hierarchical Model We introduce potential parameters r , r , . . . , r (corresponding to n objects) and semi- 1 2 parameterize the general linear mixed model (1) with observations y , y , . . . , y satisfying 1 2 n the following model: y = X b + Z b + S + e , i i i i i r jG G, b jt N(0, D(t)), i G DP(G , a), (17) 2 2 2 S j(s , r ) N 0, s C (r ) , e N(0, s I ). i s i s i i i n This is converted to a hierarchical model as follows: Axioms 2023, 12, 431 11 of 37 ind 2 2 y j(b, b , S , s ) N(X b + Z b + S , s I), i i i i i i ind 2 2 S j(s , r ) N 0, s C (r ) , i s i s i i iid fr , r , . . . , r gjG G, 1 2 n (18) G DP(G , a), iid b jt N(0, D(t)), 2 2 2 2 (s , b, t, s ) p(s ) N(b , B) p(t) p(s ), s 0 s where “ind” means independent only, S and b (1 i n) are independent of each other, i i 2 2 2 and p(s ), N(b , B), p(t), and p(s ) are the prior distributions of parameters s , b, t, and s , respectively. If b is a scalar quantity corresponding to the random intercept of the model, we have t = s . This model generalizes the exponential covariance function of the OU process. It realizes the automatic clustering effect between objects through parameter r . We call model (18) the Bayesian semiparametric autoregressive model, or simply the SPAR (semiparametric autoregressive) model. Note that the SPAR model (18) is a parallel version of Quintana et al.’s [4] with additional conditions that the random effects fS : i = 1, . . . , ng have a common vari- ance component s , and the autocorrelation parameters (r , r , . . . , r ) are assumed to be 1 2 conditional i.i.d. with DP(G , a). This is different from model (3) in Quintana et al. [4], which assumes that the random-effect parameters (f , . . . , f ) are conditional i.i.d. with 1 n 2 2 DP(G , a), where f = (s , r ) with var(S ) = s C (r ) in our notation. The simpler as- i i i i i i i sumptions help simplify the posterior inference. Quintana et al.’s [4] model assumptions do not lead to explicit posterior distributions of model parameters. Their posterior inference on model parameters may have to be performed by nonparametric MCMC algorithms. Because no explicit expressions related to posterior inference on model parameters are given by [4], we are not able to conclude if Quintana et al.’s [4] Bayesian posterior inference is a complete semiparametric MCMC method or a combination of semiparametric and nonparametric MCMC methods. By imposing simpler assumptions on the parameters in model (18), we are able to obtain the explicit posterior distributions of all model parameters in the subsequent context and conclude that our approach to handling the SPAR model (18) is a complete semiparametric MCMC method. In the SPAR model (18), the random process part is an OU process mixture. The corre- lation matrix C (r ) possesses the following form: i i 2 3 jt t j jt t j jt t j jt t j in i1 i2 i1 i3 i1 il i1 i 1 r r r r i i i i 6 7 jt t j jt t j jt t j jt t j in i2 6 i1 i2 i3 i2 il i2 i 7 r 1 r r r 6 7 i i i i 6 7 . . . . . 6 7 . . . . . 6 . . . . . 7 C (r ) = 6 7. (19) i i jt t j jt t j jt t j jt t j jt t j in ik 6 7 i1 ik i2 ik k3 ik il ik i r r r r r 6 7 i i i i i 6 7 . . . . . 6 7 . . . . . . . . . . 4 5 jt t j jt t j jt t j jt t j i1 in i2 in i3 in il in i i i i r r r r 1 i i i i Using the property of the OU process structure, C (r ) can be analyzed backwards. The in- i i jt t j i,k+1 ik verse matrix of C (r ) is a tridiagonal. Denote by r = r (k = 1, 2, . . . , n 1). We i i ik i have the (k, l)-th element of C (r ) given by i i jt t j+jt t j++jt t j jt t j il ik i,k+1 ik i,k+2 i,k+1 il i,l 1 C (r ) = r = r = r r . . . r . (20) i i ik i,k+1 i,l 1 kl i i Axioms 2023, 12, 431 12 of 37 So, C (r ) can be rewritten as i i 2 3 1 r r r r . . . r r . . . r i1 i1 i2 i1 i1 i,n 1 i,l 1 6 7 r 1 r r . . . r r . . . r i1 i2 i2 i,l 1 i2 i,n 1 6 7 6 . . . . . . 7 . . . . . . 6 7 . . . . . . 6 7 6 7 r . . . r r . . . r r . . . r r . . . r r . . . r i1 i,k 1 i2 i,k 1 i3 i,k 1 ik i,l 1 ik i,n 1 6 i 7 6 7 . . . . . . . . . . . . 4 5 . . . . . . r . . . r r . . . r r . . . r r . . . r 1 i1 i,n 1 i2 i,n 1 i3 i,n 1 il i,n 1 i i i i Using the correlation theory of the anticorrelation random variables [22], we can ˜ ˜ compute the elements of the inverse matrix C (r ) of C (r ) as follows: i i i i n o C (r ) = , i i 1 r i1 2 2 n o 1 r r i,k 1 ik C (r ) = , k = 2, 3, . . . , n 1, i i i 2 2 2 2 kk 1 r r + r r i,k 1 ik i,k 1 ik (21) n o ik C (r ) = , k = 1, 2, . . . , n 1, i i i kk+1 1 r ik n o C (r ) = . i i n n 1 r i i i,n 1 C (r ) turns out to be a tridiagonal matrix. i i The random process corresponding to the i-th object S = (s , s , . . . , s ) satisfies i i1 i2 in the conditional distribution: 2 2 S js , r N 0, s C (r ) . i s i s i i So, the conditional density function is 2 2 2 2 f (S js , r ) = f (s , . . . , s js , r ) = f (s js , r ) f (s js , . . . , s , s , r ). (22) i i i1 in i i1 i in i1 i,n 1 i s s s s i i i When the inverse matrix of the covariance matrix C (r ) of the random process S has the i i i above tridiagonal form, based on the property of the multivariate normal distribution, the conditional density function f (S js , r ) of S can be decomposed into the product of n i i i i univariate Gaussian density functions as follows: s N(0, s ), i1 2 2 ˜ ˜ s j(s = s ) N s r , s (1 r ) , 1 1 i2 i1 i1 s i1 (23) 2 2 s j(s = s ˜ , . . . , s = s ˜ ) N s ˜ r , s (1 r ) , ik i1 1 i,k 1 k 1 k 1 i,k 1 s i,k 1 2 2 s j(s = s ˜ . . . , s = s ˜ ) N s ˜ r , s (1 r ) . in i1 1 i,n 1 n 1 n 1 i,n 1 s i,n 1 i i i i i jt t j i,k+1 ik It is known that r = r , k = 1, 2, . . . , n 1. Hence, ik i i Axioms 2023, 12, 431 13 of 37 s N(0, s ), i1 s jt t j 2 2jt t j i2 i1 i2 i1 s j(s = s ˜ ) N s ˜ r , s 1 r , i2 i1 1 1 s jt t j 2jt t j (24) ik i,k 1 2 ik i,k 1 s j(s = s ˜ ) N s ˜ r , s 1 r , ik i,k 1 k 1 k 1 i i jt t j 2jt t j in i,n 1 in i,n 1 i i i i s j(s = s ˜ ) N s ˜ r , s 1 r . in i,n 1 n 1 n 1 s i i i i i i As a result, the conditional density function of S in the random process f (S js , r ) can be i i i decomposed as follows: jt t j 2jt t j 2 2 ik i,k 1 2 ik i,k 1 f (S s , r ) =N s 0, s N s s ˜ r , s 1 r i i i1 ik k 1 s s s i i (25) jt t j 2jt t j in i,n 1 in i,n 1 i i 2 i i N s s ˜ r , s 1 r . in n 1 s i i i i The above decomposition greatly simplifies the computation of the distribution of the random process S , which can be obtained by direct computation of the distribution of a single Gaussian variable. 6. The Marginal Likelihood, Prior Determination, and Posteriori Inference Let y = (y , y , . . . , y ), b = (b , b , . . . , b ), S = (S , S , . . . , S ), and r = (r , r , . . . , r ). n n n 1 2 1 2 1 2 1 2 n 2 2 Denote by Y = (b, s , s , r, t), which represents the parameter vector in model (18). A random process that specifies the AR structure in the SPAR model (18) is 2 2 S js , r N 0, s C (r ) , r jG G , G DP(G , a). (26) 0 0 i s i s i i i The marginal distribution of the covariance parameter r can be obtained from the joint distribution of all terms in (26) as follows: i 1 a 1 r G , r j(r , r , . . . , r ) G + d (27) 1 0 i 1 2 i 1 0 å r a + i 1 a + i 1 k=1 for i = 2, . . . , n. We assume thatfS , S , . . . , S g is a sample from the multivariate Gaussian 1 2 n distribution. It can be regarded as a part of an exchangeable sequence. By using the exchangeable property, the corresponding order i of the observation y can be considered as the last one in all n observations from n objects. Then, y is the corresponding vector S . Or we can consider it as the last one that gives us r . The conditional prior score of r is ( i) a 1 r jr G + d , i 0 å r ( i) k (28) a + i 1 a + i 1 k6=i where r represents all other r’s except r . According to the product formula, we have ( i) p(r) = p(r , r , . . . , r ) = p(r ) p(r jr ) p(r jr , r , . . . , r ). 1 2 n 1 2 1 n 1 2 n 1 Based on the above conditional prior distributions, we can obtain the prior distribution p(r) of parameter r. Compute the likelihood function of the SPAR model (18) as follows: Axioms 2023, 12, 431 14 of 37 L(Yjy, b, S) = f (y, b, SjY) = f (yjb, S,Y) f (bjY) f (SjY) (29) 2 2 = f (yjb, S, b, s ) f (bjt) f (Sjs , r). Let 2 2 2 2 p(Y) = p(b, s , s , r, t) = p(b) p(s ) p(s ) p(r) p(t), s s This implies that the prior distributions of each parameter are independent of each other. The joint posterior of the SPAR model is computed as follows: p(Y, b, Sjy) µ p(y, b, SjY) p(Y) ( ) = f (y , b , S jY) p(Y) Õ i i (30) i=1 ( ) 2 2 = f y jb, S , b , s f (b jt) f S js , r p(Y). Õ i i i i s i i=1 That is, the joint posterior distribution is the product of the likelihood function and the prior distribution. The first term of the likelihood function is the density function of the estimated n-dimensional Gaussian distribution N(X b + Z b + S , s I) at y . The second term is in i i i i b , the density function of the estimated q-dimensional Gaussian distribution N(0, D(t)). The third term is the product of the n univariate Gaussian distribution density functions. To estimate the joint posterior distribution of the SPAR model (18), it is necessary to use the Bayesian theorem to obtain the conditional distribution of parameters in the model: 2 2 p(s jy, b, b, S, s , r, t), 2 2 p(bjy, b, S, s , s , r, t), 2 2 p(s jy, b, b, S, s , r, t), 2 2 (31) p(b jy, b, b , S, s , s , r, t), i i s 2 2 p(r jy, b, b, S, s , s , r , t), i i 2 2 p(S jy, b, b, S , s , s , r, t), i i s 2 2 p(tjy, b, b, S, s , s , r). The MCMC algorithm is employed to estimate these conditional distributions. The condi- tional distribution of a parameters is denoted by p(j) in the subsequent context. 7. A Monte Carlo Study 7.1. Simulation Design To verify that the SPAR model (18) can effectively simulate the correlation structure in the longitudinal data, the empirical sample data were generated under the four situations of zero mean and covariance structure being compound symmetric (CS), autoregressive (AR), mixed CS and AR, and nonstructured, respectively. The MCMC method was employed to estimate the covariance matrix and the correlation matrix in the four different cases, respec- tively, and compared with the traditional Bayesian inverse-Wishart estimation method. Consider a brief form of the SPAR model (18): y = (b + b )e + S + e , (32) i i i where b represents a fixed intercept, e is an n 1 vector of ones, b N(0, s ) is a random i i intercept, and S is an OU process corresponding to y . Convert the above model into a hierarchical model: Axioms 2023, 12, 431 15 of 37 ind 2 2 iid y j(b, b , S , s ) N (b + b )e + S , s I , i i i i b N(0, s ), ind 2 2 2 S j(s , r ) N 0, s C(r ) , b N(m , s ), i i i 0 s s (33) iid s I G(a , b ), 0 0 fr , r , . . . , r gjG G, 1 2 n s I G(a , b ). s 1 1 G DP(G , a), For the special case of balanced sample design with n = m (i = 1, 2, . . . , n) and t = t for i i j j i = 1, 2, . . . , n, the joint posterior distribution of model (18) can be easily computed by 2 2 2 2 p(yjb, b, S, s ) p(b) p(Sjs , r) p(b, s , s , r) 2 2 s s p(b, s , s , r, b, Sjy) = p(y) 2 2 2 2 (34) µ p(yjb, b, S, s ) p(b) p(Sjs , r) p(b, s , s , r) s s ( ) 2 2 2 2 = f y jb, b , S , s f (b ) f S js , r p(b) p(s ) p(s ) p(r). Õ i i i i s i s i=1 To realize the Bayesian inference of the model, it is necessary to use the Bayesian theorem to obtain the conditional distribution of each parameter. The lengthy derivations of all conditional probability distributions are given in the Appendix A at the end of the paper. Performing a posteriori inference on the covariance matrix S = (s ) (m m) is i j equivalent to estimating the posteriori mean of S: n o 2 2 2 ˆ ˜ S = EfSjyg = E s A + s C(r) + s I y , m m (35) b s where A = ee . The posteriori estimate for the correlation matrix R can be obtained by ˆ ˆ using S to calculate the estimated R of the correlation matrix R: 1 1 2 2 ˆ ˆ ˆ R = R SR , R = diag s ˆ , . . . , s ˆ , S = s ˆ : m m. (36) 0 0 0 mm i j The mean square error loss function is used to evaluate the performance of the poste- rior mean: ! 1 m m MSE(S) = (s ˆ s ) . (37) i j i j å å i=1 j=1 Compared with the Bayesian estimates of the covariance matrix and correlation matrix under other loss functions [23], the most common one is the entropy loss function defined by 1 1 ˆ ˆ ˆ L (S,S) = tr(SS ) logjSS j m. (38) The quadratic loss function is given by 1 2 ˆ ˆ (39) L (S,S) = tr(SS I) . The Bayesian estimates of the covariance matrices based on the loss functions L (S,S) and L (S,S) are given by h i h i h i 1 1 1 1 1 1 ˆ ˆ S = E S jy and vec(S) = E S S jy vec E S jy , (40) respectively, where vec stands for the column vectorization of a matrix, and “ ” denotes the Kronecker product of matrices. Similarly, the Bayesian estimate of correlation moment R under various loss functions can be obtained. Axioms 2023, 12, 431 16 of 37 7.2. Simulation Specification and Display of Empirical Results In the simulation, we firstly set up the four different covariance structures as in the following Equations (41) and the priors as given in the following Equations (43)–(46). Then, we run the MCMC training trial 2000 times. After seeing the convergence trend approach relative stability after 2000 training trials, we generated 20 datasets consisting of 100 sample points of length 6 for each object. This is equivalent to the longitudinal data structure in Table 1, with n = 100, p = 6, and k = 1 for each generated longitudinal dataset. The four covariance matrices are designed as follows: S (i, j) = 10I(i = j) + 7 I(i 6= j) ji jj S (i, j) = 10 0.4 (41) S (i, j) = 0.3S (i, j) + 0.7S (i, j) 3 2 S (i, j) = p . 1 +ji jj The root mean square error (RMSE) for estimating S is computed by 20 6 6 1 1 t ˆ R MSE = MSE, MSE = (S S ) . (42) å å å i j i j 20 36 k=1 i=1 j=1 The RMSE for estimating the correlation matrix R is computed similarly. For each of the four different covariance structures, we used the R-package (called Pandas, available upon request) to perform a preliminary analysis on the simulated data with a specified prior distribution, respectively. Then, we employed the MCMC algorithm to perform the sampling estimation on each parameter in the model. We used the three methods mentioned before to perform the convergence assessment on the sampling results. Finally, we obtained the estimates for the four types of covariance and correlation matrices. The estimation error was computed based on three types of loss functions. The inverse- Wishart estimation error was also obtained for comparison. For the case of the CS covariance structure , the prior distribution of the parameter in the model is specified as follows: b N(0, 25), G = N(0, 10), a = 0.75, s I G(3, 2), (43) iid b N(0, 7), s I G(6, 10). Each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations were taken to draw the posterior sampling trend diagram, as shown in Figure 4. The negative ELBO loss histogram and energy graph estimated by the model with the CS structure are shown in Figures 5 and 6, respectively, where the energy graph in Figure 6 was generated by the Python package PyMC3 (https://www.pymc.io/projects/docs/en/ v3/pymc-examples/examples/getting_started.html) (accessed on 26 April 2023), which displays two simulated density curves: the blue one stands for the energy value at each MCMC sample point subtracted by the average energy (like conducting data centerization); the green one stands for the difference of the energy function (like deriving the derivative of a differential function). A normal energy distribution from an MCMC sampling indicates the sampling process tends to a stable point. It implies convergence of the MCMC sampling. More details on energy computation in MCMC sampling by PyMC3 can be found on this website (https://www.pymc.io/projects/docs/en/v3/api/inference.html#module- pymc3.sampling). Figures 5 and 6 incorporate both popular methods for evaluating the convergence of the MCMC sampling in our Monte Carlo study. All energy graphs in the Axioms 2023, 12, 431 17 of 37 subsequent context have the same interpretation as they do here. We skip the tedious interpretations for all other energy graphs to save space. Figure 4. Sampling results under the CS structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed for p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the CS structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (43)). The orange line stands for sampling distribution under the SPAR model (18), and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Each graph in the right column shows the relationship between each estimated parameter (the ordinate) versus the number of random samples in the abscissa, ranging from 1 to 1000. Each graph in the left column presents the kernel-estimated density function of the parameter from the last 1000 samples. As can be seen from Figures 5 and 6, after several iterations of the MCMC algorithm, the negative ELBO loss is stable between 0 and 25, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the sampling distribution and the trend graph, we can conclude that the MCMC algorithm is convergent. Axioms 2023, 12, 431 18 of 37 Figure 5. Negative ELBO loss histogram: in Figure 5, the horizontal axis stands for the number of iterations in the MCMC sampling with size n = 100, the vertical axis for the negative ELBO loss. Figure 6. Energy graph: in Figure 6, the estimated distribution of energy is based on 1000 samples with size n = 100. For the case of the AR covariance structure, the prior distribution of each parameter in the model is specified as follows: b N(0, 10) G = N(0.4, 10), a = 0.75, s I G(1.25, 0.01) (44) iid b N(0, 0.01), s I G(3.76, 9.566) Each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations were taken to draw the posterior sampling trend diagram, as shown in Figure 7. Note that the DP-estimated density curves show different central locations from those in Figure 4, because they are generated from different prior distributions with different covariance structures (see the difference between Equations (43) and (44)). Axioms 2023, 12, 431 19 of 37 Figure 7. Sampling results under the AR structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the AR structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (44)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figures 4 and 7 have the same structure in both axes for the two columns of graphs. The histogram of negative ELBO loss and the energy graph for when the AR structure is estimated are shown in Figures 8 and 9, respectively. They show that the histogram of the negative ELBO loss tends to be stable after several iterations, which is basically below 50, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the posteriori sampling and the trend graph, it can be concluded that the algorithm converges quickly. Axioms 2023, 12, 431 20 of 37 Figure 8. Negative ELBO loss histogram: in Figure 8, the horizontal axis stands for the number of iterations in the MCMC sampling with size n = 100, the vertical axis for the negative ELBO loss. Figure 9. Energy graph: In Figure 9, the estimated distribution of energy is based on 1000 samples with size n = 100. For the covariance of the mixed structure of CS and AR, the prior distribution of each parameter in the model is specified as follows: b N(0, 10), G = N(0.4, 10), a = 0.75, s I G(15.2, 14), (45) iid b N(0, 2.1), s I G(3.82, 6.867). Each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations were taken to draw the posterior sampling trend graph, as shown in Figure 10. The negative ELBO loss histogram and the model energy graph are shown in Figures 11 and 12, respectively. As can be seen from the histogram of the negative ELBO loss in Figure 11, the negative ELBO loss is basically under control between 0 and 30, after several iterations of the algorithm, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the posteriori sampling and the trend graph, we can conclude that the algorithm is convergent. Axioms 2023, 12, 431 21 of 37 Figure 10. Sampling results under the mixed structure of CS and AR: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the mixed structure of CS and AR and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (45)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figures 4 and 10 have the same structure in both axes for the two columns of graphs. Figure 11. Negative ELBO loss histogram: in Figure 11, the horizontal axis stands for the number of iterations in the MCMC sampling with size n = 100, the vertical axis for the negative ELBO loss. Axioms 2023, 12, 431 22 of 37 Figure 12. Energy graph: in Figure 12, the estimated distribution of energy is based on 1000 samples with size n = 100. For the case of independent structure covariance, the prior distributions of the param- eters in the model are specified as follows: b N(0, 20), G = N(0.672, 10), a = 0.75, s I G(2.22, 7.48), (46) iid b N(0, 1.106), i 2 s I G(3.20, 3.443). The MCMC method was used to sample and estimate each parameter in the model. The last 1000 iterations were taken to draw the posterior sampling trend graph shown in Figure 13. The negative ELBO loss histogram and the model energy graph are shown in Figures 14 and 15, respectively. It can be seen that the negative ELBO loss approaches 0 quickly. Based on the posterior sampling and the trend diagram, we can conclude that the algorithm is convergent. Figure 13. Cont. Axioms 2023, 12, 431 23 of 37 Figure 13. Sampling results under the independent structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the independent structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (46)). The orange line stands for sampling distribution under the SPAR model (18) and the blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (orange) and the inverse-Wishart model (blue), respectively, as well as the Markov chain for each individual DP (different color for each). Figures 4 and 13 have the same structure in both axes for the two columns of graphs. Figure 14. Negative ELBO loss histogram: in Figure 14, the horizontal axis stands for the number of iterations in the MCMC sampling with size n = 100, the vertical axis for the negative ELBO loss. Figure 15. Energy graph: in Figure 15, the estimated distribution of energy is based on 1000 samples with size n = 100. Axioms 2023, 12, 431 24 of 37 The above model is applied to the estimation of covariance matrices, correlation matri- ces, and model errors. We compare it with with the inverse-Wishart method. The outcomes of the estimation errors are shown in Table 2: Table 2. Estimated RMSE, L , and L based on covariance matrix S. 1 2 Loss Function Model C1 C2 C3 C4 RMSE SPAR 0.9960 0.1037 0.8995 1.5358 Inv-W 2.9691 2.9352 3.4804 3.3285 L SPAR 0.1810 0.1201 1.3217 0.9558 Inv-W 1.8389 1.4632 3.0516 1.2926 L SPAR 0.1289 0.1345 1.7575 0.6044 Inv-W 0.2405 0.9576 2.8083 1.7565 C1 represents the covariance of the complex symmetry (CS) structure; C2 represents the AR structure covariance; C3 represents the mixed structure covariance of CS and AR; C4 represents independent structure covariance; and Inv-W = inverse-Wishart. Similarly, the estimation results of the four corresponding correlation matrices are shown in Table 3: Table 3. Estimated RMSE, L , and L based on correlation matrix R. 1 2 Loss Function Model C1 C2 C3 C4 RMSE SPAR 0.1628 1.1417 0.3775 0.1542 Inv-W 0.2308 2.4152 0.9115 0.2771 L SPAR 0.0956 0.5482 0.0631 1.2354 Inv-W 0.1519 1.1424 0.1756 1.9137 L SPAR 0.1013 0.9809 0.4937 0.5854 Inv-W 0.1238 1.1935 0.5205 0.6484 C1 represents the covariance of the complex symmetry (CS) structure; C2 represents the AR structure covariance; C3 represents the mixed structure covariance of CS and AR; and C4 represents independent structure covariance. In Tables 2 and 3, the covariance models are compared with each other based on three types of loss functions. In estimating the four covariance structures, the SPAR model performs better than the traditional inverse-Wishart method for each covariance structure. The estimation error of the SPAR model is much smaller than that of the inverse-Wishart method. When estimating the correlation matrix, except for the relatively poor SPAR performance under the strict AR structure, all other models perform roughly the same based on the quadratic loss function L (S,S). Based on the comparison of the estimation errors in Tables 2 and 3, the SPAR model shows fairly good performance in estimating the covariance matrix and correlation matrix. 7.3. Analysis of a Real Wind Speed Dataset To verify the effectiveness of the SPAR model built in this paper in practical application, we employ the Hongming data, which contain the ground meteorological data of Dingxin Station, Jinta County, Jiuquan City, Gansu Province, China. We use the SPAR model and the MCMC method to estimate the covariance matrix and correlation matrix under four different covariance structures (CS, AR, the mixture of CS and AR, and unstructured) and compare the estimates with those from the traditional Bayesian estimation by the inverse- Wishart method. The covariance structure of the four cases is shown in Equation (41), and the mean square error is shown in Equation (42). The real data are arranged into a longitudinal data structure, as shown in Table 1 with n = 100, p = 6, and k = 1. The following graphs are plotted in the same way as in the corresponding graphs in the Monte Carlo study in Section 7.2 but with real data input in the same Python program. For the case of CS covariance structure , each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations are taken to draw the posterior sampling trend diagram, as shown in Figure 16. Axioms 2023, 12, 431 25 of 37 Figure 16. Sampling results under the CS structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the CS structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (43)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figures 4 and 16 have the same structure in both axes for the two columns of graphs. The negative ELBO loss histogram and the model energy graph are shown in Figures 17 and 18, respectively. It can be seen that the negative ELBO loss approaches 0 quickly, and the sample energy conversion distribution is basically consistent with the true energy distribution. Based on the posterior sampling and the trend diagram, we can conclude that the algorithm is convergent. For the case of the AR covariance structure, each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations are taken to draw the posterior sampling trend diagram, as shown in Figure 19: Axioms 2023, 12, 431 26 of 37 Figure 17. Negative ELBO loss histogram: in Figure 17, the horizontal axis stands for the number of iterations in the MCMC algorithm with size n = 100, the vertical axis for the negative ELBO loss. Figure 18. Energy graph: In Figure 18, the estimated distribution of energy is based on 1000 samples with size n = 100. Figure 19. Cont. Axioms 2023, 12, 431 27 of 37 Figure 19. Sampling results under the AR structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the AR structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (44)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figures 4 and 19 have the same structure in both axes for the two columns of graphs. The negative ELBO loss histogram and the energy graph estimated by the model under the AR structure are shown in Figures 20 and 21, respectively. It shows that the histogram of the negative ELBO loss approaches 0 quickly. Based on the posteriori sampling and the trend graph, it can be concluded that the algorithm converges quickly. Figure 20. Negative ELBO loss histogram: in Figure 20, the horizontal axis stands for the number of iterations in the MCMC algorithm with size n = 100, the vertical axis for the negative ELBO loss. Figure 21. Energy graph: in Figure 21, the estimated distribution of energy is based on 1000 samples with size n = 100. Axioms 2023, 12, 431 28 of 37 For the covariance of the mixed structure of CS and AR, each parameter in the model is sampled and estimated using the MCMC method. The last 1000 iterations are taken to draw the posterior sampling trend graph, as shown in Figure 22. The negative ELBO loss histogram and the model energy graph are shown in Figures 23 and 24, respectively. Figure 22. Sampling results under the mixed structure of CS and AR: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the mixed structure of CS and AR and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (45)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figures 4 and 22 have the same structure in both axes for the two columns of graphs. Axioms 2023, 12, 431 29 of 37 Figure 23. Negative ELBO loss histogram: in Figure 23,the horizontal axis stands for the number of iterations in the MCMC algorithm with size n = 100, the vertical axis for the negative ELBO loss. Figure 24. Energy graph: in Figure 24, the estimated distribution of energy is based on 1000 samples with size n = 100. As can be seen from the histogram of the negative ELBO loss in Figure 23, the negative ELBO loss approaches 0 quickly after several iterations of the algorithm. Based on the posteriori sampling and the trend graph, we can conclude that the algorithm is convergent. For the case of independent structure covariance, the MCMC method is used to sample and estimate each parameter in the model. The last 1000 iterations are taken to draw the posterior sampling trend graph shown in Figure 25. Figure 25. Cont. Axioms 2023, 12, 431 30 of 37 Figure 25. Sampling results under an independent structure: The left panel gives the sampling distributions (smoothed by the kernel density estimation) for the parameters and the DP, which contain six pairs of DPs (each subject is observed p = 6 variables in the simulation), each pair consisting of a DP generated from model (18) with the independent structure and a DP generated from the traditional model with the inverse-Wishart structure (sampling population specified by Equations (23), (24) and (46)). The real blue line stands for sampling distribution under the SPAR model (18) and the dotted blue line for the inverse-Wishart model; the right panel gives the the graphs of two Markov chains under the SPAR model (real line) and the inverse-Wishart model (dotted line), respectively, as well as the Markov chain for each individual DP (different color for each). Figures 4 and 25 have the same structure in both axes for the two columns of graphs. The negative ELBO loss histogram and the model energy graph are shown in Figures 26 and 27, respectively. It can be seen that the negative ELBO loss approaches 0 quickly. Based on the posterior sampling and trend diagram, we can conclude that the algorithm is convergent. Figure 26. Negative ELBO loss histogram: in Figure 26, the horizontal axis stands for the number of iterations in the MCMC algorithm with size n = 100, the vertical axis for the negative ELBO loss. Axioms 2023, 12, 431 31 of 37 Figure 27. Energy graph: in Figure 27, the estimated distribution of energy is based on 1000 samples with size n = 100. In Tables 4 and 5, the covariance models are compared with each other based on three types of loss functions, RMSE, L , and L . When using the four covariance structures based on either the covariance matrix or the correlation matrix, the SPAR model always performs better than the traditional inverse-Wishart method—it always has a smaller value of RMSE, L , or L when comparing the SPAR model with the Inv-W model under the 1 2 same covariance structure C1, C2, C3, or C4 in Tables 4 and 5. Table 4. Estimated RMSE, L , and L based on covariance matrix S. 1 2 Loss Function Model C1 C2 C3 C4 RMSE SPAR 0.0831 0.0962 0.08837 0.0981 Inv-W 0.4120 0.6826 0.3181 0.4064 L SPAR 0.1360 0.1996 0.1465 0.2508 Inv-W 46.6743 16.8967 19.0260 14.3543 L SPAR 0.0870 0.0826 0.2084 0.3517 Inv-W 661.4643 142.3157 450.6686 340.6994 C1 represents the covariance of the complex symmetry (CS) structure; C2 represents the AR structure covariance; C3 represents the mixed structure covariance of CS and AR; C4 represents independent structure covariance; and Inv-W = inverse-Wishart. Table 5. Estimated RMSE, L , and L based on correlation matrix R. 1 2 Loss Function Model C1 C2 C3 C4 RMSE SPAR 0.0187 0.5359 0.1976 0.1985 Inv-W 0.6585 0.6486 0.6515 0.7580 L SPAR 0.0066 0.1670 0.0662 0.1846 Inv-W 0.0968 4.8245 0.3174 0.8035 L SPAR 0.0036 0.0132 0.0080 0.0939 Inv-W 0.0079 1.3683 0.1202 0.3488 C1 represents the covariance of the complex symmetry (CS) structure; C2 represents the AR structure covariance; C3 represents the mixed structure covariance of CS and AR; and C4 represents independent structure covariance. 8. Concluding Remarks The SPAR model (18) provides an explicitly complete semiparametric solution to the estimation of model parameters through the MCMC algorithm. Compared with the model formulation in Quintana et al. [4], the MCMC algorithm for posterior inference on model (18) is easier to implement and may converge faster because of the explicit simple posterior distributions of the model parameters. An effective and fast-converging MCMC algorithm plays an important role in Bayesian statistical inference. The SPAR model (18) gains some trade-off in easy implementation and fast convergence in the MCMC algorithm by imposing simpler assumptions on model parameters. Axioms 2023, 12, 431 32 of 37 With regard to the option of choosing the initial values for estimating model parame- ters by the MCMC algorithm, we recommend using a numerical optimization method such as the maximum posteriori (MAP) estimation to obtain an estimator as the initial value of a parameter. It is likely to speed up the convergence speed of the sampling parameter. We employ the Gibbs sampling algorithm when estimating the parameters in the model. The convergence diagnosis of Markov chains generated by the MCMC algorithm is assessed by the posterior sampling trend plot, negative ELBO histogram, and the energy graph, which show the observation of fast convergence of the MCMC sampling process. By applying the SPAR model to four different covariance structures through both Monte Carlo study and a real dataset, we illustrate its effectiveness in handling nonstationary forms of covariance structures and its domination over the traditional inverse-Wishart method. It should be pointed out that the effectiveness and fast convergence of the MCMC algorithm depend on both model assumption and the priors of model parameters. Our Monte Carlo study was carried out by choosing normal priors for the model parameters and the inverse Gamma distribution for the variance components. This choice led to the easy implementation of the MCMC algorithm. It will be an interesting future research direction to develop some meaningful criteria for model and algorithm comparison in the area of Bayesian nonparametric longitudinal data analysis. The main purpose of our paper is to give an easily implementable approach to this area with a practical illustration. We can conclude that the complete semiparametric approach to Bayesian longitudinal data analysis in this paper is a significant complement to the area studied by some influential peers, such as Mukhopadhyay and Gelfand (1997, [21]), Quintana et al. (2016, [4]), and others. Author Contributions: G.J. developed the Bayesian semiparametric method for longitudinal data. J.L. helped validate the methodology and finalize the manuscript. F.W. conducted major data analysis. X.C. conducted initial research in the Bayesian semiparametric modeling study on longitudinal data analysis under the guidance of G.J. in her master ’s thesis [24]. H.L. helped with data analysis. J.C. found data and reference resources. S.C. and F.Z. finished the simulation study and also helped with data analysis and edited all figures. G.J. and J.J. wrote the initial draft. G.J. and F.W. completed the final writing—review and editing. All authors have read and agreed to the final version of the manuscript. Funding: The research was supported by National Natural Science Foundation of China Under Grant No.41271 038, Jiajuan Liang’s research is supported by a UIC New Faculty Start-up Research Fund R72021106, and in part by the Guangdong Provincial Key Laboratory of Interdisciplinary Research and Application for Data Science, BNU-HKBU United International College (UIC), project code 2022B1212010006. Data Availability Statement: The real data and Python code presented in the present study are available on request from the corresponding author. Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results. Appendix A. Derivations of Conditional Probability Distributions in Section 7.1 n m 2 2 p(bj) µ p(yjb, b, S, s ) p(b) = f (y jb, b , s , s ) p(b) ÕÕ i j i i j i=1 j=1 ( ) n m (nm+1) (b m ) 1 2 nm 1 = exp (y b b s ) (2p) s s å å i j i i j (A1) 2 2 2s 2s i=1 j=1 ( " #) n m 2 (y b b s ) 1 (b m ) i j i i j µ exp + , å å 2 2 2 s 2s i=1 j=1 0 Axioms 2023, 12, 431 33 of 37 where n m (y b b s ) i j i i j (b m ) å å 2 2 i=1 j=1 " # n m nm 1 1 0 = + b 2 (y b s ) + b (A2) 2 2 2 å å i j i i j 2 s s s s 0 0 i=1 j=1 n m 2bm 1 2 0 + (y b s ) . 2 å å i j i i j 2 i=1 j=1 Combining (A1) with (A2), we have ( " ! #) n m 1 nm 1 2 1 p(bj) µ exp + b 2 (y b s ) + b . (A3) 2 2 2 å å i j i i j 2 s s s s 0 0 i=1 j=1 Let n m nm 1 1 m 2 2 s = + , m = s (y b s ) + . (A4) b å å i j i i j b 2 2 2 2 s s s s 0 i=1 j=1 0 The complete posterior distribution of b can be expressed as b N(m , s ). (A5) b b The conditional distribution of s is computed as follows: 2 2 2 p(s j) µ p(yjb, b, S, s ) p(s ) ( " #) n m 1 1 2 = exp (y b b s ) + b 2 å å i j i i j i=1 j=1 (A6) nm+2a +2 nm b 0 0 2 2 2 (2p) (s ) G(a ) ( " #) nm+2a +2 1 1 2 2 µ (s ) exp m(y b b s ) + b . å å i j i i j 0 s 2 i=1 j=1 This shows that the posterior distribution of s is an inverse gamma distribution. We use the following notation: n m 2 nm 1 2 s I G + a , (y b b s ) + b . (A7) 0 0 å å i j i i j 2 2 i=1 j=1 The conditional distribution of b is computed as follows: p(b j) µ f (y jb, b , S , s ) p(b ) i i i i ( ) m+1 2 m 1 1 2 = (2p) s s exp (y b b s ) 2 å i j i i j 2 2s 2s (A8) j=1 ( " #) m 2 (y b b s ) b i j i i j 1 i µ exp + , 2 2 s s j=1 b Axioms 2023, 12, 431 34 of 37 where 2 2 (y b b s ) i j i i j 2 2 j=1 m m 2 mb 2b 1 2 i i i = (y b b s ) + (y b s ) + (A9) 2 å i j i i j 2 2 å i j i j s s s 2 j=1 j=1 b m 1 2 2 = + b (y b b s )b . å i j i i j i 2 2 i 2 s s s j=1 Combining (A8) with (A9), we have (y b b s ) m i j i i j 1 i exp å + 2 2 2 j=1 s s ( " ! #) (A10) 1 m 1 1 µ exp + b 2 (y b b s )b . å i j i i j i 2 2 2 s s s j=1 Let m 1 2 b s = + , m = (y b b s ). ( A.11) b å i j i i j b 2 2 2 s s j=1 The posterior distribution of b is given by b N(m , s ). (A12) i b /The conditional distribution of S = (s , . . . , s ) is obtained by i i1 im 2 2 p(S j) µ p(y jb, b , S , s ) p(S js , r ). (A13) i i i i s i It is necessary to compute the conditional distribution of each component of S separately: 2 2 p(s jy, b, b, s , s , r) i1 2 2 2 2 p(yjb, b, s , s ) p(b) p(s js , r) p(b, s , s , r) i1 i1 s s 2 2 2 p(yjb, b, s ) p(b) p(b, s , s , r) n 2 2 = f (y jb, b, s , s ) p(s js , r) i1 i1 i1 i=1 s ( ) n+1 s n 1 2 i1 (A14) = (2p) s s exp (y b b s ) i1 i i1 s å 2 2 2s 2s i=1 ( " #) n 2 2 1 (y b b s ) i1 i i1 i1 µ exp + 2 2 2 s 2s i=1 ( " #) 1 n 1 2 µ exp + s (y b b )s . i1 å i1 i i1 2 2 2 s s s i=1 Let n 1 2 i1 s = + , m = (y b b ). (A15) s i1 i s å i1 i1 2 2 2 s s s i=1 We obtain the posteriori distribution for s : i1 s N(m , s ). (A16) i1 s i1 i1 Axioms 2023, 12, 431 35 of 37 Similarly, the posterior distribution of s is obtained by i2 2 2 p(s j) µ p y jb, b, s , s p s js ˆ , s , r i2 Õ i2 i2 i2 i1 i=1 8 9 ( ) jt t j > > i2 i1 < = n s s ˆ r i2 i1 = exp (y b b s ) exp i2 i i2 2 å 2s 2jt t j > 2 i2 i1 > : 2s 1 r ; i=1 s n+1 1 jt t j n 1 i2 i1 (2p) s s 1 r 8 9 ( ) jt t j > > i2 i1 < = n s s ˆ r (A17) i2 i1 1 2 µ exp (y b b s ) exp å i2 i i2 2s 2jt t j > 2 i2 i1 > 2s 1 r : ; i=1 s ( " #) n n ns 1 1 2 2s i2 i2 = exp (y b b ) + (y b b ) å i2 i å i2 i 2 2 2 s s s i=1 i=1 8 2 39 jt t j jt t j > > 2 i2 i1 i2 i1 < = s + s ˆ r 2s s ˆ r i1 i2 i1 i2 i i 6 7 exp . 4 5 2 2jt t j > 2 i2 i1 > s 1 r : ; After removing the constant term, the posterior distribution of s is proportional to i2 8 20 1 2 3 39 < n jt t j = i2 i1 s ˆ r 1 n 1 2 1 i1 4@ A 4 5 5 exp + s 2 (y b b ) + s . 2 2 å i2 i i2 2 i2 s 2jt t j s 2jt t j 2 i2 i1 2 i2 i1 ; s 1 r s 1 r s i=1 s i i Let n 1 s = + i2 2 2jt t j i2 i1 s 2 s 1 r 2 3 (A18) jt t j n i2 i1 s ˆ r i1 2 s i 4 5 m = s (y b b ) + . s å i2 i i2 i2 2 2jt t j s 2 i2 i1 s 1 r i=1 We obtain the posteriori distribution for s : i2 s N(m , s ). (A19) i2 s i2 i2 Similarly, the posteriori distribution of s (j = 1, . . . , m) can be obtained: i j 2 3 jt t j i j i,j 1 6 s ˆ r 7 i,j 1 2 i 6 7 m = s (y b b ) + s å i j i i j s i j4 2 5 s 2jt t j i j i,j 1 i=1 2 s 1 r (A20) n 1 s = + i j 2 2jt t j i j i,j 1 s 1 r s N(m , s ). i j s s i j i j Axioms 2023, 12, 431 36 of 37 Combining the above derivations, we can obtain the conditional distribution of s as follows: 2 2 2 p(s j) µ f (Sjs , r) p(s ) s s s = f (s ) f (s js ) f (s js , s , . . . , s ) Õ i1 i2 i1 im i1 i2 i,m 1 i=1 m 1 m 2jt t j b i j i,j 1 1 m 1 2 a 1 = (2p) s 1 r (s ) exp s Õ s G(a ) s 1 s j=2 8 9 2 3 > > jt t j i j i,j 1 > > > > s s ˆ r < 6 i j i,j 1 7= 6 7 exp 6s + 7 i1 2jt t j > i j i,j 1 > 2s 4 5 > s > j=2 1 r > > : i ; (A21) 8 2 0 1 39 > jt t j > i j i,j 1 > > > > s s ˆ r < 6 B m i j i,j 1 C 7= 1 1 6 B C 7 = exp s + + b 6 B C 7 å 1 i1 2jt t j > s 4 2@ i j i,j 1 A 5> > s > j=2 1 r > > : ; m b m 2jt t j i j i,j 1 1 2 a 1 2 2 (2p) 1 r (s ) G(a ) j=2 8 9 2 0 1 3 jt t j > i j i,j 1 > < = m s s ˆ r m i j i,j 1 6 B C 7 2 a 1 2 1 1 µ (s ) exp 4 @s + A + b 5 . s 2 å 2jt t j 2 i1 i j i,j 1 > s > : 1 r ; j=2 The posterior distribution of s is actually an inverse gamma distribution: 0 0 1 1 jt t j i j i,j 1 s s ˆ r B B i j i,j 1 C C m 1 B B C C 2 2 s I GB + a , Bs + C + b C. ( A22) 1 1 s i1 å 2jt t j @ 2 2@ i j i,j 1 A A j=2 1 r The posterior distribution of r is obtained as follows: 2 2 p(r j) = p(r jS, s , r ) = p(r jS , s , r ) i i i i i i s s 2 2 (A23) p(S js , r ) p(r jr ) p(s , r ) i s i i i s i 2 = µ f (S js , r ) p(r jr ), i s i i i 2 2 p(S js ) p(s , r ) i i s s which can be expressed as follows: 2 2 p(r jS , s , r ) q d + r H , q = b f (S js , r ) i i s i å i j i i i j i s j j6=i (A24) f (S js , r)g (r) 2 i s 0 r = ba f (S js , r)g (r)dr, H , i i s i f (S js , r)g (r)dr i s where g is the probability density function of the base distribution G , b is the constant 0 0 satisfying the equation å q + r = 1, and H is the marginal distribution of the parameter i j i i j6=i r based on the prior G and the variable S . 0 i References 1. Pullenayegum, E.M.; Lim, L.S. Longitudinal data subject to irregular observation: A review of methods with a focus on visit processes, assumptions, and study design. Statist. Methods Med. Res. 2016, 25, 2992–3014. [CrossRef] [PubMed] 2. Chakraborty, R.; Banerjee, M.; Vemuri , B.C. Statistics on the space of trajectories for longitudinal data analysis. In Proceedings of the 2017 IEEE 14th International Symposium on Biomedical Imaging (ISBI 2017), Melbourne, VIC, Australia, 18–21 April 2017; pp. 999–1002. Axioms 2023, 12, 431 37 of 37 3. Xiang, D.; Qiu, P.; Pu, X. Nonparametric regession analysis of multivariate longitudinal data. Stat. Sin. 2013, 23, 769–789. 4. Quintana, F.A.; Johnson, W.O.; Waetjen, L.E.; BGold, E. Bayesian nonparametric longitudinal data analysis. J. Am. Statist. Assoc. 2016, 111, 1168–1181. [CrossRef] 5. Cheng, L.; Ramchandran, S.; Vatanen, T.; Lietzén, N.; Lahesmaa, R.; Vehtari, A.; Lähdesmäki, H. An additive Gaussian process regression model for interpretable non-parametric analysis of longitudinal data. Nat. Commun. 2019, 10, 1798. [CrossRef] 6. Kleinman, K.P.; Ibrahim, J.G. A semiparametric Bayesian approach to the random effects model. Biometrics 1998, 54, 921–938. [CrossRef] [PubMed] 7. Gao, F.; Zeng, D.; Couper, D.; Lin, D.Y. Semiparametric regression analysis of multiple right- and interval-censored events. J. Am. Statist. Assoc. 2019, 114, 1232–1240. [CrossRef] [PubMed] 8. Lee, Y. Semiparametric regression. J. Am. Statist. Assoc. 2006, 101, 1722–1723. [CrossRef] 9. Sun, Y.; Sun, L.; Zhou, J. Profile local linear estimation of generalized semiparametric regression model for longitudinal data. Lifetime Data Anal. 2013, 19, 317–349. [CrossRef] [PubMed] 10. Zeger, S.L.; Diggle, P.J. Semiparametric models for longitudinal data with application to CD4 cell numbers in HIV seroconverters. Biometrics 1994, 50, 689–699. [CrossRef] [PubMed] 11. Li, Y.; Lin, X.; Müller, P. Bayesian inference in semiparametric mixed models for longitudinal data. Biometrics 2010, 66, 70–78. [CrossRef] [PubMed] 12. Hunag, Y.X. Quantile regression-based Bayesian semiparametric mixed-effects models for longitudinal data with non-normal, missing and mismeasured covariate. J. Statist. Comput. Simul. 2016, 86, 1183–1202. [CrossRef] 13. Li, J.; Zhou, J.; Zhang, B.; Li, X.R. Estimation of high dimensional covariance matrices by shrinkage algorithms. In Proceedings of the 2017 20th International Conference on Information Fusion (Fusion), Xi’an, China, 10–13 July 2017; pp. 955–962. 14. Doss, H.; Park, Y. An MCMC approach to empirical Bayes inference and Bayesian sensitivity analysis via empirical processes. Ann. Statist. 2018, 46, 1630–1663. [CrossRef] 15. Neal, R.M. Markov chain sampling methods for Dirichlet process mixture models. J. Comput. Graph. Statist. 2000, 9, 249–265. 16. Kingma, D.P.; Welling, M. Auto-encoding variational Bayes. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. 17. Csiszar, I. I-Divergence geometry of probability distributions and minimization problems. Ann. Probab. 1975, 3, 146–158. [CrossRef] 18. Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; pp. 63–71. 19. Barndorff-Nielsen, O.E.; Shephard, N. Non-Gaussian Ornstein–Uhlenbeck-based models and some of their uses in financial economics. J. Roy. Statist. Soc. (Ser. B) 2001, 63, 167–241. [CrossRef] 20. Griffin, J.E.; Steel, M.F.J. Order-based dependent Dirichlet processes. J. Am. Statist. Assoc. 2006, 101, 179–194. [CrossRef] 21. Mukhopadhyay, S.; Gelfand, A.E. Dirichlet process mixed generalized linear models. J. Am. Statist. Assoc. 1997, 92, 633–639. [CrossRef] 22. Zimmerman, D.L.; Núñez-Antón, V.A. Antedependence Models for Longitudinal Data; CRC Press: Boca Raton, FL, USA, 2009. 23. Hsu, C.W.; Sinay, M.S.; Hsu, J.S. Bayesian estimation of a covariance matrix with flexible prior specification. Ann. Inst. Statist. Math. 2012, 64, 319–342. [CrossRef] 24. Chen, X. Longitudinal Data Analysis Based on Bayesian Semiparametric Method. Master ’s Thesis, Lanzhou University, Lanzhou, China, 2019. (In Chinese) Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png
Axioms
Multidisciplinary Digital Publishing Institute
http://www.deepdyve.com/lp/multidisciplinary-digital-publishing-institute/longitudinal-data-analysis-based-on-bayesian-semiparametric-method-t90K3kD5pT