Access the full text.

Sign up today, get DeepDyve free for 14 days.

Statistics
, Volume 2021 (1809) – Sep 7, 2018

/lp/arxiv-cornell-university/additive-multivariate-gaussian-processes-for-joint-species-f8WYy0qDmt

- ISSN
- 1936-0975
- eISSN
- ARCH-3347
- DOI
- 10.1214/19-BA1158
- Publisher site
- See Article on Publisher Site

Bayesian Analysis (2019) Authors’ accepted version, Number , pp. Additive multivariate Gaussian processes for joint species distribution modeling with heterogeneous data Accepted for publication in Bayesian Analysis † ‡ § Jarno Vanhatalo , Marcelo Hartmann and Lari Veneranta Abstract. Species distribution models (SDM) are a key tool in ecology, conser- vation and management of natural resources. Two key components of the state-of- the-art SDMs are the description for species distribution response along environ- mental covariates and the spatial random eﬀect that captures deviations from the distribution patterns explained by environmental covariates. Joint species distri- bution models (JSDMs) additionally include interspeciﬁc correlations which have been shown to improve their descriptive and predictive performance compared to single species models. However, current JSDMs are restricted to hierarchical gen- eralized linear modeling framework. Their limitation is that parametric models have trouble in explaining changes in abundance due, for example, highly non- linear physical tolerance limits which is particularly important when predicting species distribution in new areas or under scenarios of environmental change. On the other hand, semi-parametric response functions have been shown to improve the predictive performance of SDMs in these tasks in single species models. Here, we propose JSDMs where the responses to environmental covariates are modeled with additive multivariate Gaussian processes coded as linear models of coregionalization. These allow inference for wide range of functional forms and interspeciﬁc correlations between the responses. We propose also an eﬃcient ap- proach for inference with Laplace approximation and parameterization of the in- terspeciﬁc covariance matrices on the euclidean space. We demonstrate the bene- ﬁts of our model with two small scale examples and one real world case study. We use cross-validation to compare the proposed model to analogous semi-parametric single species models and parametric single and joint species models in interpo- lation and extrapolation tasks. The proposed model outperforms the alternative models in all cases. We also show that the proposed model can be seen as an extension of the current state-of-the-art JSDMs to semi-parametric models. MSC 2010 subject classiﬁcations: Primary 60G15, 60K35; secondary 62P12. Keywords: linear model of coregionalization, hierarchical model, heterogeneous data, spatial prediction, model comparison, Laplace approximation, covariance transformation. First available in Project Euclid: 3 June 2019: Available in Project Euclid: https://projecteuclid.org/euclid.ba/1559548823 Department of Mathematics and Statistics and Organismal and Evolutionary Biology Research Program, University of Helsinki, jarno.vanhatalo@helsinki.ﬁ Department of Mathematics and Statistics and Department of Computer Science, University of Helsinki, marcelo.hartmann@helsinki.ﬁ Natural Resources Institute Finland, Finland, lari.veneranta@luke.ﬁ c 2019 International Society for Bayesian Analysis DOI: 10.1214/19-BA1158 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 arXiv:1809.02432v2 [stat.ME] 8 Aug 2019 2 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 1 Introduction Species distribution models (SDMs) are key tools in the ecologists’ toolbox. They have been widely used, among other applications, to study species habitat preferences (La- timer et al., 2006), to improve identiﬁcation and management of conservation areas and natural resources (Kallasvuo et al., 2017) and to evaluate species responses to environ- mental ﬁltering under climate change scenarios (Clark et al., 2014; Kotta et al., 2019). The main goals of statistical inference in these contexts are to use species observations and information on the associated environment to infer the relationship between these two attributes and to predict over regions of unsampled locations to build thematic species distribution maps (Gelfand et al., 2006; Elith and Leathwich, 2009). Species distribution modeling is, thus, directly related to inferring species’ responses to environmental factors (Latimer et al., 2006). This is traditionally done using gen- eralized linear or additive models as well as an array of machine learning approaches such as regression trees or maximum entropy modeling (Elith and Leathwich, 2009). These approaches model each species separately and cannot account for species interac- tions nor shared responses to the environment. However, species interaction with other species is potentially as important factor as its response to environment. Moreover, in many practical situations, data from species can be patchy or scarce in which case sharing information between species can signiﬁcantly improve models’ predictive per- formance (Ovaskainen and Soininen, 2011; Clark et al., 2014; Hui et al., 2013; Thorson et al., 2015; Hartmann et al., 2017). For these reasons, joint species distribution models (JSDM) have gained increasing attention in recent years (Warton et al., 2015). Dunstan et al. (2013) and Hui et al. (2013) introduced species archetype modeling where species’ responses to the environment are clustered into few archetype models. Pollock et al. (2014) proposed to use the multivariate probit regression model (Chib and Greenberg, 1998) to describe geographical co-occurrence patterns between frogs and eucalyptus trees in Australia and Clark et al. (2014) built a JSDM to infer richness and loss of species under climate change scenarios. Thorson et al. (2015) introduced a spatial latent factor model to predict spatial distribution of breeding birds and rock ﬁsh communities and recently, Ovaskainen et al. (2017) introduced the hierarchical modeling of species communities (HMSC) framework which includes detailed description of interspeciﬁc correlations in covariate responses and spatial random eﬀects. Current JSDMs rely on the classical framework of hierarchical generalized linear models (GLMs). Even though this approach allows ﬂexible modeling by describing the randomness of response variables with diﬀerent probabilistic models (Nelder and Wed- derburn, 1972), it is still limited by its parametric assumptions. Hence, it may fail to accurately describe a species’ response to environmental conditions (Vanhatalo et al., 2012; Kotta et al., 2019). Here, we propose a semi-parametric JSDM model represented with multivariate Gaussian processes (GPs). GPs are ﬂexible semi-parametric regres- sion models where the regression function is estimated without restrictive parametric assumptions about its form (O’Hagan, 1978; Rasmussen and Williams, 2006). Our model integrates the main strengths of semi-parametric single species models (Vanhatalo et al., 2012; Golding and Purse, 2016) and generalized linear model based JSDMs. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 3 First, we model the species responses to environmental covariates with additive mul- tivariate GPs. This allows us to capture wide range of nonlinear responses and share in- formation about these responses between species. Second, due to the hierarchical model structure, the model can simultaneously accommodate several kinds of outcome vari- ables (observations/measurements) by assigning diﬀerent types of probabilistic models for them. This is important, since it allows us to exploit diﬀerent types of measure- ments which commonly arise in real-case scenarios of multiple species surveys. Our presentation focuses in the mostly used probabilistic models in practice, the Bernoulli (Binomial) and Poisson with over-dispersion (Negative-Binomial). We also propose an eﬃcient computational approach build around Laplace method (Golding and Purse, 2016; Vanhatalo et al., 2010). Lastly, we present a structured cross-validation scheme to validate and compare models’ performance in diﬀerent kinds of prediction tasks. This paper is organized as follows. In Section 2 we introduce a motivating case study. In section 3, we introduce the additive multivariate GPs and in Section 4 we discuss its predictive properties and introduce the computational methods for inference. In Section 5, we illustrate the basic properties of the model through two simple examples and introduce the speciﬁc case study model. In Section 6 we present the case study results and we end by discussion and conclusions in Section 7. 2 Motivating case study: coastal species distributions in the Gulf of Bothnia As a motivating example, we study spring distribution of four ﬁsh species and three types of algae or macro-vegetation on the coastal region of the Gulf of Bothnia in the northern Baltic Sea. The Gulf of Bothnia is a brackish water basin between Sweden and Finland covering an area of approximately 600 × 120 km. Its coastal areas play a central role in the ecosystem and many Baltic ﬁsh stocks are dependent on the coastal regions for their reproduction (Veneranta et al., 2013; Kallasvuo et al., 2017). Coastal zones are also the most sensitive regions of the Baltic sea to both natural variation and anthropogenic pressures (Reusch et al., 2018). Hence, these areas are of central importance for conservation and there is need for detailed knowledge on the distribution of coastal species and predictions concerning their response to environmental changes. 2.1 Case study species and data The studied ﬁsh species are the juvenile or adult three-spined stickleback (Gasterosteus aculeatus ) and nine-spined stickleback (Pungitius pungitius ) as well as larvae of white- ﬁsh (Coregonus lavaretus ) and vendace (Coregonus albula ). Both whiteﬁsh and vendace are commercially important species and the sticklebacks are one key ﬁsh fauna in the Gulf of Bothnia ecosystem (Bergstr¨ om et al., 2015). We treat the studied vegetation and algae in functional group level comprising of diatomous algae, ﬁlamentous and macro- vegetation. These are the dominating groups of benthic vegetation in shores where larvae of Coregonids (whiteﬁsh and vendace) occur at early developmental stages. In the scale imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 4 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Figure 1: The study region, Gulf of Bothnia, and locations for species data. The region is divided into ﬁve subregions (I-V) to be used in cross-validation. The environmental conditions (described by the environmental covariates) are heterogeneous between these regions (Veneranta et al., 2013) which allows to test models’ extrapolation performance. of Gulf of Bothnia, their occurrence also reﬂects the salinity, nutrient balance and wind exposure of studied area (Veneranta et al., 2013). The case study species reﬂect the large scale environmental gradients and the changes in the Baltic Sea environment in last decades. Sticklebacks in the Baltic Sea use the shallow coastal zone for reproduction (Bergstr¨ om et al., 2015) and high abundances of sticklebacks in the Baltic Sea have been positively correlated with high occurrence of ﬁlamentous algae (Eriksson et al., 2009, 2012). Coregonids in coastal areas prefer more oligotrophic waters (Veneranta et al., 2013). Whiteﬁsh and vendace reproduce in stony reefs and islets of Baltic Sea in late autumn and the larvae hatch at ice break-up in spring. In sheltered areas the overwintering reeds (Phragmites australis ) dominate the macro-vegetation in spring. Diatomous algae consist of several epiphytic species that have an early spring bloom at ice break up and settle rapidly over bottom when light and water temperature increase (Busse and Snoeijs, 2002). Filamentous algae in this study consist mostly of Pilayella sp. It is a fast growing annual species that dominates the exposed shores at Baltic Sea in early spring (R¨ onnberg and Bonsdorﬀ, 2004). The whole study area was divided into sampling subareas (Figure 1). Within each sub-area, data were collected at several sampling sites distributed so that they covered the range of most important environmental covariates in that sub-area. Sampling was imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 5 Variable Description Resolution [m] Openness The average openness and exposure to winds 300 Distance to deep Distance to 20 m depth curve 200 Sandy bottom index Area weighted distance to the sandy shores 90 Ice breakup week The end date (weeks) of ice cover in 2009 1852 Chlorophyll-a Chlorophyll-a concentration 1852 River Distance to the nearest river mouth 150 Bottom class Bottom classiﬁcation to 6 categories 200 Winter salinity Length of ice winter 10,000 Table 1: Environmental covariates used in the case study and their original resolution. For this study, the resolution of all the environmental covariates was scaled to 300m. conducted in 2009-2011. At each site sampling was done approximately one week after the ice break. In the scale of Gulf of Bothnia, the ice break-up happens in a period of approximately one month. The species data used in this work comprise of 160 distinct sampling sites. In 70 sites (2010 data) all species were sampled and the rest of the sites comprise samples for the ﬁshes only (Figure 1). Fish samples comprise the number of caught ﬁsh together with information on the sampling eﬀort at each site. The eﬀort was measured as the volume of sampled water (m ). A more detailed description of the data collection is provided by Veneranta et al. (2013) For plant data, the bottom was photographed at 5-13 locations at depth of 30 cm within a distance of 2 m and parallel to shore line. The occurrence of a species was recorded at 16 uniformly distributed points in all these photographs. The case study plants can grow over each others so that presence of one plant at a spatial location does not exclude the presence of other plants. The whiteﬁsh and vendace data were previously analyzed by Veneranta et al. (2013) and Vanhatalo et al. (2012) but the data on other species are unpublished. 2.2 Environmental covariates Gulf of Bothnia hosts rich variety of environmental conditions. Coastal areas are aﬀected by inﬂows from land as well as shallow and complex topography. In the scale of Gulf of Bothnia, there is a gradient in river inﬂuence, salinity, temperature and length of ice cover period from north to south. We used seven real-valued and one categorical abiotic environmental covariates that were available throughout the study area as raster maps. Each species observation was combined with covariates from the raster cell where the sample location fell in. These are summarized in table 1 and described and motivated in detail by Veneranta et al. (2013). Due its fresh water origin vendace is a priori sensitive to even small changes in salinity levels experienced in the Gulf of Bothnia whereas the other species respond to salinity only in the Baltic Sea scale. Hence, we used all covariates only for vendace but excluded the winter salinity from other species. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 6 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 3 Hierarchical multivariate species distribution model We start our model building following the generic hierarchical structure similar to the one presented by, e.g., Wikle (2003), Cressie and Wikle (2011) and Banerjee et al. (2015) [data|process, parameters] : π (y(x, s)| f (x, s),η) [process|parameters] : π (f (x, s)|θ) [parameters] : π(η,θ) (3.1) where the ﬁrst layer of hierarchy is the probabilistic model which deﬁnes the conditional distribution for multivariate data Y (x, s), at spatial location s with associated covariates x, given the model parameters η and the multivariate latent process f (x, s). The second layer deﬁnes the prior distribution for the latent process given the process parameters θ, and the third layer deﬁnes the prior distribution for all unknown parameters. Let j ∈ J = {1,··· , J} be the species index set and J the total number of species in the study. Denote by Y (x, s) = [Y ··· Y ] the J -variate random vector with com- 1 J th T ponents Y = Y (x , s) related to the j species at spatial location s = [s s ] (coordi- j j j 1 2 nates) under the inﬂuence of environmental covariates x = [x ··· x ] where c is the j,1 j,c j j j number of environmental covariates for the species j. We denote by x the full vector of covariates and X the covariate space. The species speciﬁc covariates x are sub-vectors of x such that x ∈ X where X ⊂ X and X is c -dimensional. Here, we assume that j j j j j T 2 J given a multivariate latent process f (x, s) = [f (x , s)··· f (x , s)] : X × R → R , 1 1 J J the probabilistic model for Y = Y (x, s) factorizes as follows π (y(x, s)| f (x, s),η) = π (y | f , η ) (3.2) Y Y j j j j=1 where f = f (x , s) is ﬁxed but an unknown realization of the j’th process with covari- j j j ates x at the spatial location s. η is the vector of parameters of the probabilistic model j j π for the species j and η = [η ··· η ] . In general any probabilistic model for data Y 1 J could be used and the observation models should be chosen according to the assumed sampling process. Here, we consider in detail the Binomial and the Negative-Binomial (over-dispersed Poisson) models used in our case study (Section 2). These models can be seen as observation models resulting from inhomogeneous point process model for species abundance (Gelfand et al., 2006; Warton and Shepherd, 2010). In practice, each sampling site consists of a small area where the observations are made. In our case study, the area covered by a sampling site is so small compared to the whole study region that the latent function is practically constant in each sampling site. The model for the count of species presences at uniformly distributed points in a sampling site (plants in our case study) is then Binomial given by (Gelfand et al., 2006) z (s) y z (s)−y π (y|f (x, s), z (s)) = p(f (x, s)) [1− p(f (x, s))] I (y) (3.3) Y B {0,...,z (s)} B B where z (s) is the total number of observation points in the sampling site at location s and p(f (x, s)) is the success probability, which will be modeled through the logit here. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 7 The natural model for the number of individuals in a closed area or volume in a sampling site (ﬁsh in our case study) is Poisson which we extend to over-dispersed Poisson using the Negative-Binomial given by (see Liu and Vanhatalo, 2018, Section 7.1 for details) r y Γ(r + y) r m(x, s) π (y|f (x, s), z (s), r) = I (y) (3.4) Y N N N 0 y!Γ(r) r + m(x, s) r + m(x, s) where m(x, s) = z (s) exp[f (x, s)] = E(Y |f (x, s), z (x), r) and r is the over-dispersion N N N parameter. The latent process f (·) corresponds to the logarithm of a species (relative) density and z (s) is the sampled volume of water in the sampling site at location s. For this parameterization V (Y |f (x, s), z (x), r) = m(x, s) + m(x, s) /r. Increasing r N N decreases the variance and when r → ∞, the model approaches the Poisson distribution. 3.1 Univariate additive latent Gaussian process First, we assume that marginally for any species j, the process model is given by f (x , s) = β + h (x ) + ε (s) (3.5) j j j,0 j j j 2 2 where β is the oﬀset weight with distribution β |σ ∼ N (0, σ ) and h : X → R j,0 j,0 j j j,0 j,0 is a predictor function of environmental covariates. The predictor function is assumed to be additive over the covariates, h (x ) = h (x ), and each additive term is j j j,r j,r r=1 given an independent zero mean GP prior, so that h (x )|θ ∼ GP 0, k (x , x ;θ ) (3.6) j j j h j,r j,r j,r j,r r=1 0 0 where k (x , x ;θ ) = Cov(h (x ), h (x )|θ ) is a covariance function with h j,r j,r j,r j,r j,r j,r j,r j,r j,r T T T 0 parameter θ and θ = [θ ···θ ] . For example, with k (x , x ; θ ) = x j,r j h j,r j,r j,r j,1 j,c j,r j,r 0 2 2 x σ and θ = σ , the predictor function corresponds to linear model h (x ) = j,r j,r j,r j,r j,r j,r x β where β ∼ N (0, σ ) (Rasmussen and Williams, 2006). With other choices j,r j,r j,r j,r of covariance functions we can model non-linear responses in which case the model can be seen as an alternative to the traditional generalized additive models (GAMs, Hastie and Tibishirani, 1986). The general form of an additive GP prior would include also joint eﬀects of covariates (Duvenaud et al., 2011) but this is not considered here. The term ε (s) is a spatial GP, 2 0 2 ε (s)|σ ,` ∼ GP 0, k (s, s ;` , σ ) (3.7) j j j j j j 0 2 2 where k (s, s ;` , σ ) is a spatial covariance function with variance σ and range param- j j j eter ` . When we consider independent models for each species, that is the traditional single species SDMs (SSDMs), the processes h and are mutually independent among j,r j all species. The model outlined this far is similar to the GP based species distribution models used by, e.g., Vanhatalo et al. (2012), Kallasvuo et al. (2017) and Golding and Purse (2016). Next, we introduce models which consider dependency. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 8 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 3.2 Additive multivariate Gaussian process priors In order to account for possible species interdependence, we ﬁrst include spatial species interaction into the model through the linear model of coregionalization (LMC) (Mardia and Goodall, 1993; Gelfand et al., 2004). Write ε(s) = [ε (s)··· ε (s)] and assume that 1 J ε(s) has LMC covariance structure with species speciﬁc correlation functions k (·,· ;` ). The covariance structure of the LMC with interspeciﬁc spatial dependence is then, 0 0 0 Cov ε (s), ε 0 (s )|Σ ,` = u (j, j )k (s, s ;` ) (3.8) j j ,l l l=1 T T T 0 0 T with ` = [` ···` ] and u (j, j ) the entry (j, j ) of U = L L where L is ,l ,l ,l ,l 1 J ,l th the l column of the Cholesky decomposition L of the coregionalization matrix Σ , that is matrix of interspeciﬁc correlations between spatial GP. Hence, the vector-valued latent process f (x, s) = [f (x , s)··· f (x , s)] unconditional on β ,. . .,β follows a 1 1 J J 1,0 J,0 multivariate GP which we denote as c J X X 0 0 f (x, s)|Λ ∼ MGP 0, Σ + k (x , x ;θ) + k (s, s ;` )U (3.9) 1 0 h r j ,j r r j r=1 j=1 T T T 2 2 0 where Λ = (Σ , Σ ,θ,`), Σ = diag(σ , . . . , σ ), θ = [θ ···θ ] and k (x , x ;θ) = 1 0 0 h r 1,0 J,0 1 J r r 0 0 diag k (x , x ;θ ),··· , k (x , x ;θ ) . If the predictor functions, h , were h 1,r 1,r h J,r J,r j,r 1,r 1,r J,r J,r linear, the prior (3.9) would correspond to traditional multivariate spatial model with independent linear eﬀects and spatial LMC (Gelfand et al., 2004). We extend (3.9) to an additive multivariate GP prior where the dependence between the species speciﬁc additive predictor functions is modeled with LMC. We demonstrate this with a model where the set of covariates is equal for all species, that is, dim(X ) = c and x = x for all j. Then the model is written as c J J XX X 0 0 ˜ ˜ f (x, s)|Λ ∼ MGP 0, Σ + k (x , x ;θ )U + k (s, s ;` )U 2 0 h r r j,r h ,j j j ,j j,r r=1 j=1 j=1 (3.10) where Λ = (Λ , Σ , . . . , Σ ) and Σ is the interspeciﬁc covariance matrix between 2 1 h h h 1 c r the species speciﬁc predictor functions of r’th covariate, k (·,·;θ ) is a correlation hj,r j,r T th function related to the predictor function h , U = L L and L is the j j,r h ,j h ,j h ,j r r r h ,j column of the Cholesky decomposition L of Σ . When the set of covariates is not the h h r r same for all species, covariate speciﬁc predictive functions are correlated only between species that share those same covariates. A JSDM with multivariate GP prior (3.10) can be interpreted as extension of GP based SSDMs (Vanhatalo et al., 2012; Golding and Purse, 2016) to joint species mod- eling similarly as done with the generalized linear model based JSDMs (Pollock et al., 2014; Ovaskainen et al., 2017). The enhanced inferential ability of the multivariate ad- ditive GP compared to univariate GP models lies in its capability to infer similarity imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 9 (dissimilarity) in species speciﬁc responses to covariates and in the spatial random ef- fect. The similarity/dissimilarity of responses of two species j and j along a covariate r is indicated by a positive/negative value for correlation and hence, examining the LMC covariance matrices, Σ , can provide new insight to species to species associations. 3.3 Marginally uniform priors for correlation parameters Here, we deﬁne the prior for coregionalization covariance matrices but deﬁne the co- variance functions and rest of the priors in Section 5. A standard choice for prior for correlation matrices is the inverse Wishart distribution. However, if there is no prior information on interspeciﬁc correlations, we follow Hartmann et al. (2017) and suggest to use marginally uniform priors for the LMC covariance matrices Σ and Σ . These can be achieved by the distribution of Barnard et al. (2000) and Tokuda et al. (2012). Let P be a correlation matrix with elements ρ . A prior distribution that is marginally j,j noninformative for the correlations, that is, the marginal distribution for every ρ is j,j uniform over (−1, 1), is achieved with the distribution v J Γ( ) 1 v 2 (v−1)(J−1)−1 − 2 2 π(P|v) = |P| |P | 1 (detP ) (3.11) jj (0,∞) Γ ( ) j=1 when v = J − 1. Here, Γ (·) is the multivariate gamma function and matrix P is J jj th th obtained by removing the j column and j row ofP . When v increases, the probability density (3.11) becomes increasingly concentrated around the origin. 4 Posterior inference and predictive properties Given a set of species observations at locations s , i = 1, . . . , n , respectively for each i j j species j = 1, . . . , J , the likelihood can be written as J j Y Y π(y| f,η) = π (y | f , η ) (4.1) j j,i j,i j j j j=1 i =1 where y is the i ’th observation of species j at the i ’th spatial location s asso- j,i j j j i T T T ciated with a set of covariates x ∈ X . The observed vector y = [y ··· y ] with i j j 1 J T T T T y = [y ··· y ] collects all the species observations and f = [f . . . f ] , where j,1 j,n j j 1 J f = [f ··· f ], collects the corresponding latent variables. Hence, the likelihood j,1 j,n j j factorizes over the latent variables and the inference can be done similarly as with uni- variate GP models. Markov chain Monte Carlo (MCMC) would provide exact answers in the limit of large sample size but deterministic approximations such as Laplace approx- imation or expectation propagation algorithm have also been shown to provide accurate approximate inference for univariate GP models with much lower computational time (Nickisch and Rasmussen, 2008; Rue and Marino, 2009; Vanhatalo et al., 2010). In order to study the properties of the model, we will examine the Laplace approximation for the posterior of latent variables conditional on the hyperparameters. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 10 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 4.1 Posterior predictive inference conditional on hyperparameters Posterior of latent variables The Laplace’s method approximates the conditional posterior of the latent function values f = f (s , x ) at the spatial location s with covariates x as ∗ ∗ ∗ ∗ ∗ −1 −1 −1 π(f | y,η, Λ) ≈ N f |C C f, C − C (C + W) C (4.2) ∗ ∗ ∗,f ∗ ∗,f f,∗ where f is the (conditional) maximum a posterior (MAP) estimate of latent variables, J j X X f = arg max log π (y | f , η ) + logN (f | 0, C ) (4.3) j j,i j,i j j j j j f∈R j=1 i =1 and W is the Hessian matrix of the negative log-likelihood function evaluated at f . Here, C is the prior covariance matrix of f , C is the (prior) covariance matrix between ∗,f elements of f and f . C is the prior covariance of f . In case of multivariate additive ∗ ∗ ∗ GP (3.10) the prior covariance matrix is given by ˜ ˜ σ J ··· 0 u (1, 1){K } ··· u (1, J ){K } n h ,j h 1,1 h ,j h 1,J 0 1 j,r j,r c J r r XX . . . . . . . . . . . . C = + . . . . . . 2 r=1 j=1 ˜ ˜ 0 ··· σ J u (J, 1){K } ··· u (J, J ){K } 0 J h ,j h J,1 h ,j h J,J j,r j,r r r ˜ ˜ u (1, 1){K } ··· u (1, J ){K } ,j 1,1 ,j 1,J J j j . . . . . + (4.4) . . j=1 ˜ ˜ u (J, 1){K } ··· u (J, J ){K } ,j J,1 ,j J,J j j 0 0 0 where J is an n×n matrix full of ones u (j, j ) and u (j, j ) are the entry (j, j ) of U n ,l h ,l ,l ˜ ˜ 0 0 and U (see (3.8)), and{K } and{K } are correlation matrices with elements h ,l j,j h j,j r l l,r ˜ ˜ ˜ ˜ [{K } 0 ] = k (s , s ;` ) and [{K } 0 ] = k (x , x |θ ). The j,j i ,i 0 ,l i i 0 l h j,j i ,i 0 h i ,r i 0,r l,r l j j l,r j l,r j j j j j other correlation matrices are constructed similarly. If data on all species are available at all sampling locations, the covariance matrix reduces to Kronecker product similarly as in LMC models by Mardia and Goodall (1993) and Gelfand et al. (2004), so that P P P c J J ˜ ˜ C = Σ ⊗ J + U ⊗ K + U ⊗ K where n = n for all 0 n h ,j h ,j j r=1 j=1 j,r j=1 j j,r j = 1, . . . , J . Since, in general, the second and third term of C are full matrices, it can be seen from (4.2) and (4.3) that the posterior of f and the posterior predictive distribution of f are aﬀected by observations of all species at any spatial locations. The marginal posterior of additive terms Typically, we are also interested in species’ responses along covariates encoded by the additive predictor functions. Their marginal posteriors, conditional on hyperparame- ters, are analogous to (4.2). The matrices C and C are only constructed using the ∗ ∗,f In case of Gaussian observation model, this equals to the true posterior density of f | y,η, Λ. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 11 correlation functions corresponding to the latent function of interest. For example, to study the response of species j along covariate x we evaluate the posterior predictive distribution for h (x ) using (4.2) so that we replace C j,r r ∗,f h i ˜ ˜ C = u (j, 1){K } ,··· , u (j, J ){K } , (4.5) h ,f h ,l h j,1 h ,l h j,J j,r r l,r l,r l=1 and similarly for C . However, in case of the traditional LMC (3.9) only the spatial random eﬀects are correlated between species whereas the predictor functions are not. 0 2 0 Hence, Σ is diagonal for all r so that u (j, j ) = σ if l = j = j and zero otherwise. h h ,l r r l The second terms of (4.4) is then block diagonal and only the j’th block column in (4.5) is non-zero for which reason there is no information exchange between species speciﬁc predictor functions. In case of univariate GP prior all the terms in (4.4) are block diagonal and (4.2) reduces to J independent Gaussian distributions with no information exchange among species at all. Hence, the predictive performance of the univariate GP and the two multivariate GP models are very diﬀerent. This is illustrated in section 5. The (marginal) posterior expectation and variance for new observations When predicting species occurrence probability or abundance, we need to marginalize over the posterior of the latent variables. The posterior expectation and variance for a new outcome for species j at a location (x , s ) conditional on hyperparameters are ∗ ∗ E[Y (x , s )| y,η, Λ] =E[E(Y (x , s )|f (x , s ), y,η, Λ)] (4.6) j ∗ ∗ j ∗ ∗ j ∗ ∗ V[Y (x , s )| y,η, Λ] =E[V(Y (x , s )|f (x , s ), y,η, Λ)] + j ∗ ∗ j ∗ ∗ j ∗ ∗ V[E(Y (x , s )|f (x , s ), y,η, Λ)] j ∗ ∗ j ∗ ∗ When the probabilistic model for species j is assumed to be the Negative-Binomial or Binomial model with logistic link function (3.3) we can ﬁnd either an exact result or an analytical approximation for these expectations and variances. These are given in the Supplementary material. These solutions speed up the predictive calculations considerably compared to numerically integrating f over R. Conditional scenario predictions In some applications, one might be interested in scenario predictions conditional on changes in species composition. For example, one might be interested in how removing from or introducing speciﬁc species into an area would aﬀect other species. In our case study setting, we could be interested in, for example, eﬀects of management actions that would clean ﬁlamentous algae from the shoreline. Such scenario predictions would be naturally tackled with predictive causal inference (Lindley, 2002; Pearl, 2009). In predictive causal inference the parameters of the model are inferred with the available data so far and predictions made considering alternative scenarios. To illustrate this lets ﬁrst introduce a short notation y = y (x , s ) and f ∗ ∗ J ,∗ J J ,∗ 1 1 1 = f (x , s ) where J denotes the set of species to be predicted and J the scenario ∗ ∗ 1 2 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 12 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 species assumed to be “managed”. For brevity, we omit the conditioning on hyperpa- rameters and data. The conditional distribution of Y (x , s )| y is J ∗ ∗ 1 J π(y | y ) = π(y | y , f )π(y , f )df /π(y ) (4.7) J ,∗ J ,∗ J ,∗ J ,∗ J J ,∗ J 1 J 1 1 J 1 2 1 2 2 2 where Y (x , s )| y , f only depends on f . The Laplace approximation for this J ∗ ∗ J ,∗ J ,∗ 1 J 1 1 conditional predictive distribution is shown in Supplementary material. 4.2 Parameter inference We used Laplace approximation (Vanhatalo et al., 2010) to approximate the conditional posterior of latent variables π(f | y,η, Λ) and the marginal likelihood of the hyperparam- eters π(y|η, Λ) = π(y| f,η)π(f |Λ)d f . We searched for the (approximate) maximum a posterior (MAP) estimate for hyperparameters given by (η, Λ) = arg max log q(y|η, Λ) + log π(η, Λ). (4.8) η,Λ where log q(y|η, Λ) is the Laplace approximation for the log marginal likelihood for pa- rameters. This approach produces also good approximation for the posterior predictive distribution for latent variables (Vanhatalo et al., 2010; Vehtari et al., 2016), which are the main interest in this work. Hence, we ﬁxed hyperparameters to their MAP estimate. In order to avoid constrained optimization, all the parameters were transferred to unconstrained space. We used log transformation for covariance function parameters, and for the interspeciﬁc correlation matrices we used the transformation presented by Kurowicka and Cooke (2003) and Lewandowski et al. (2009), which is a bijective map- ( ) ping between the space of correlation matrices and R . This is summarized in Sup- plementary material. We used scaled conjugate gradient optimization for locating the MAP and checked carefully that a (local) mode had really been found by verifying that gradients along all hyperparameters were zero. The required gradients of log q(y|η, Λ) were solved analytically as described by Rasmussen and Williams (2006) and Vanhat- alo et al. (2010). The Supplementary material summarizes the derivatives w.r.t. to the covariance parameters in Σ , Σ , r = 1, . . . , c. 5 Experiments Here, we ﬁrst illustrate the properties of the hierarchical multivariate GP models with two simple examples. These examples highlight particular properties of the proposed model. After this we introduce the model and analysis for our case study (Section 2). We implemented all the models using the GPstuﬀ package (Vanhatalo et al., 2013). 5.1 Demonstration with simulated spatial data Figure 2 presents simulated data and posterior predictions for spatial modelling of two species. The model follows spatial LMC; that is model (3.9), where the covariate terms imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 13 Figure 2: Illustration of spatial multivariate GP prior for JSDM with two species and spatial only component. Crosses and dots are simulated data locations of species 1 (Binomial data) and species 2 (Negative-Binomial data). See Section 5.1 for discussion. are dropped out. The spatial correlation function used in this demo is the Gaussian 0 2 2 0 −|| s− s || /2l 0 k(s, s ) = e , where || s− s || is the Euclidean distance and l is the length- scale. The ﬁrst row of subplots shows the posterior predictive probability of observing species 1 (E[Y ], Y (s) ∼ Binomial) and the second row shows the expected number of 1 1 species 2 (E[Y ], Y (s) ∼ Negative-Binomial). Plots (a) and (e) show the predictions 2 2 when the species observations are from the same region but not from exactly the same locations. In this case, the prediction of multivariate GP is similar to predictions by univariate GPs. Plots (b) and (f ) show the predictions when data are available on both species from the lower left corner and additionally data on species 1 is available from upper right corner. There is positive correlation between species, which has been inferred from the data in the lower left region, so the prediction for species 2 in upper right corner is informed by data on species 1 in that region. The last two columns illustrate the conditional scenario predictions (4.7). In the third column, the training data is the same as in the ﬁrst column and the expected values in plots (c) and (g) show joint scenario prediction in new region of same size and shape. In this scenario, species in the new region were observed so that species 1 is known to be in the top right, and species 2 in the bottom left. The fourth column shows the conditional scenario prediction for both of the species separately so that the grey marks show the locations where the other species would be introduced in these scenarios. 5.2 Demonstration with time series of species abundances Next, we consider the classical predator-prey system containing annual population counts of hare and lynx in the northern Canada from 1845 to 1935 (Odum, 1953). The data is not spatially explicit since the observations are total counts over the region so we model it as time series. This illustrates the behavior of individual additive co- variate eﬀect in the multivariate GP model (3.10) with time being the covariate. Let’s denote by Y and Y the population sizes of hare and lynx at time t, and assume that 1,t 2,t imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 14 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Figure 3: The results from the analysis of hare and lynx interaction (predator-prey system) analyzed with single and joint SDM. Plots a) and c) show the posterior mean and 95% credible interval of the latent functions and plots b) and d) show the training data, test data and posterior expectation of observations. See Section 5.2 for explanation. Y |f (t) ∼ Poisson(exp(f (t)) and Y |f (t) ∼ Poisson(exp(f (t)). We compared two 1,t 1 1 2,t 2 2 alternative priors for the latent functions, one with independent GP priors and another with joint bivariate GP prior with the LMC structure. In order to compare models’ pre- dictive performances when some species have not been observed, we removed parts of the original data from the training set, the period of 1870-1900 for hare and 1850-1870 for lynx. Figure 3 displays the result of the data analysis. The model with bivariate GP prior clearly outperforms the independent model since it predicts better the test data points in periods where training data was removed. Moreover, its predictions have also smaller uncertainty than the predictions of independent GPs in those periods of time. 5.3 The case study on coastal species distribution Case study models The plant data are modeled with the Binomial observation model (3.3) and the ﬁsh data are modeled with the Negative-Binomial observation model (3.4). In order to test the eﬀect of diﬀerent model components and the eﬀect of semi-parametric response functions versus standard quadratic response functions we compare the following models: 1) additive univariate GP predictor functions and univ. spatial random eﬀects (3.5), 2) additive univariate GP predictor functions and LMC spatial random eﬀect (3.9), 3) additive multivariate GP predictor function and LMC spatial random eﬀect (3.10), 4) additive univariate GP predictor functions only (model (3.5) without (s)), 5) univariate spatial random eﬀects only (model (3.5) without h (x )), j j imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 15 6) additive univariate quadratic predictors and univariate spatial random eﬀects (model (3.5) with h (x ) = x β and x includes the covariates and their squares), j j j j j 7) additive univariate quadratic predictors and LMC spatial random eﬀect (model (3.9) with h (x ) = x β and x includes the covariates and their squares), j j j j j 8) additive multivariate quadratic predictors and LMC spatial random eﬀects (model (3.10) with h (x ) = x β and x includes the covariates and their squares). j j j j j 9) additive multivariate quadratic predictors only (model (3.10) without (s) with h (x ) = x β and x includes the covariates and their squares). j j j j j In each model, the spatial random eﬀects were given the M´ atern covariance function 0 2 2 0 − 3r (s,s ) with ν = 3/2 degrees of freedom k (s, s ; σ ,` ) = σ 1 + 3 r (s, s ) e j j j j j where σ is the variance parameter, ` = [` ` ] is the vector of length-scales j,1 j,2 j j 0 0 T 2 −1 0 1/2 and r (s, s ) = [(s− s ) (diag(` ) ) (s− s )] . The continuous covariate eﬀects in j j the additive GP models (models 1-4) were given the Gaussian correlation function 0 2 2 0 −||x −x || /2l ˜ r r j,r k (x , x ) = e . The additive linear models (models 6-8) were coded h ,r r j r 0 0 2 as additive GPs with k (x , x ) = x x σ . Even though this is not a proper correla- h r r j,r r r j,r tion function, when used in LMC it implies a generalized linear model with interspeciﬁc dependencies between weights, β , that are the key components in state-of-the-art para- metric JSDMs. See Discussion for details. For the categorical covariate (Bottom class, 0 2 0 0 table 1) we used a delta function so that k (x , x ;θ) = σ δ (x ), where δ (x ) = 1 h r x x j,r r δ r r r r if x = x and zero otherwise. This corresponds to having an own intercept for each category. We used the marginally uniform priors (section 4.1) for the between species correlations and weakly informative priors for the rest of the parameters. The variance parameters were given Half -Student-t (μ = 0, σ = 4, ν = 4) priors and the length-scale parameters were given Half -Inverse-Student-t (μ = 0, σ = 1, ν = 4). Hence, a priori more weight is given for smooth functions with small variability. Model validation and comparison We aim to provide models that give reliable posterior predictions. Hence, it is natural to compare models with the goodness of their posterior predictive distributions. This can be done eﬃciently with cross-validation (CV) using the log predictive density diagnostics (Vehtari and Ojanen, 2012). Let D denote the full data-set. Fix K disjoint sub-sets of D, say D , . . . ,D , such that their union is D. The K-fold CV log point-wise marginal 1 K predictive density is then L (D) = log π(y |D ) where D = {∪ D : K i ∼k(i) ∼k(i) r i=1 r=1 r 6= k(i)} and k(i) is such that y ∈ D . We compare the models with the leave-one- k(i) out CV (LOO-CV, K = n) and structured 5-fold CV. We used the MAP estimate for the hyperparameters. In 5-fold CV we found the MAP for each training set separately. The LOO-CV was conducted at the MAP found with full data and we used Laplace approximation to approximate the LOO-CV log predictive densities (Vehtari et al., 2016). Laplace approximation for LOO-CV is shown to work well in GP models and, since our data is rather large, leaving only one data point out of the training set has only negligible eﬀect on the posterior of the hyperparameters (Vehtari et al., 2016). The rationale for calculating both LOO-CV and 5-fold CV comparison is the follow- ing. Since multiple species were sampled at every sampling site and each of the sampling imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 16 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 GP models LOO Cross Validation 5-fold Cross validation 1) Univariate GP h(x), univ. (s) -2.230 (0.082) -2.465 (0.094) 2) Univariate GP h(x), multiv. (s) -2.081 (0.080) -2.480 (0.100) 3) Multiv. GP h(x), multiv. (s) -1.669 (0.080) -2.316 (0.086) 4) Multiv. GP h(x) only -2.228(0.089) -2.590(0.012) 5) Multiv. (s) only -2.351(0.087) -2.500(0.086) 6) Univ. quadr. h(x), univ. (s) -2.262 (0.083) -2.517 (0.095) 7) Univ. quadr. h(x), multiv. (s) -2.117 (0.084) -2.484 (0.109) 8) Multiv. quadr. h(x), multiv. (s) -2.105 (0.084) -2.416 (0.099) 9) Multiv. quadr. h(x) only -10.246(1.297) -9.305(1.008) Table 2: Model comparison with leave-one-out (LOO) and 5-fold cross validation using the mean point wise log marginal predictive density statistics (and its standard error) over all species. see Section 5.3 for model descriptions. sites has other sites spatially nearby it, the LOO-CV log predictive densities are aﬀected signiﬁcantly by the spatial random eﬀects. The LOO-CV, thus, measures models’ inter- polation performance which can be good even if the models were not able to represent well responses along covariates (predictor functions) (Vanhatalo et al., 2012; Veneranta et al., 2013). For this reason, we structured the 5-fold CV by dividing the data into ﬁve spatially distinct groups corresponding to regions I-V in Figure 1. The sampling sites in diﬀerent groups are spatially so far from each others that the spatial random eﬀects do not aﬀect the posterior predictive distributions for test groups. Hence, the 5-fold CV tests mostly the extrapolation performance of a model, which is governed by the goodness of the predictor functions, whereas the LOO-CV tests the interpolation performance, which is governed also by the spatial random eﬀects. 6 Results 6.1 Predictive performance of models Table 2 summarizes the CV comparisons over all species and tables 1-2 in Supplementary material show the models’ predictive performance for each species separately. Tables 3- 4 in Supplementary material show the pairwise comparison of the CV point wise log predictive densities between the best GP/parametric model (models 3 and 8) and the other GP/parametric models with both the covariate eﬀects and spatial random eﬀect (models 1-2 and 6-7). The results show that model 3, which includes multivariate GP predictors and multivariate spatial random eﬀects (3.10), is the best in both LOO-CV −2 and 5-fold CV with a diﬀerence of 10 or more to the other models. Since the mean log predictive density is an average of n = 850 point-wise predictions the diﬀerence −2 of 10 corresponds to a diﬀerence of 8.5 in log (point-wise) joint predictive densities. Analogously to Bayes factors, which compare log prior predictive distribution, this can be considered a signiﬁcant diﬀerence between two models (Kass and Raftery, 1995). Hence, the multivariate GP (model 3) has signiﬁcantly better overall predictive performance over all species than the other models. According to posterior predictive imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 17 Figure 4: The maximum a posterior estimate of the correlation matrices in the multi- variate additive GP model with Gaussian covariance function (model 3). checks (Gelman et al., 2013) there are no serious discrepancies between its predictions and observed data. In extrapolation (5-fold CV) model 3 performs best also for all species separately. However, in interpolation (LOO-CV) it is not the best for all species separately (tables 1-2 in Supplementary material). Moreover, the semi-parametric GP models (models 1-4) work better than the corresponding parametric models (quadratic environmental response, models 6-9) in both interpolation and extrapolation in general. Dropping either spatial random eﬀect or covariate eﬀect out from the model decreases its performance clearly. All models work better in interpolation than in extrapolation and compared to univariate models including interspeciﬁc correlations either in spatial random eﬀects or also in environmental predictors improves models’ performance in both of these tasks. Moreover, including interspeciﬁc correlations between environmen- tal responses, h (x ), improves the extrapolation performance relatively more than j,r r including interspeciﬁc correlations only for spatial random eﬀects. Hence, multivariate spatial random eﬀect improved more interpolation whereas multivariate predictors has larger eﬀect for extrapolation as would intuitively be expected. 6.2 Eﬀects of environmental covariates The interspeciﬁc correlations (Figure 4) show that the responses to environment are similar among Coregonids and among sticklebacks whereas there are clear diﬀerences among these groups. These ﬁsh groups respond diﬀerently to sandy bottom and distance to deep. Moreover, there is strong positive spatial correlation among Coregonids and among sticklebacks but not between these groups. All species have negative spatial correlation with ﬁlamentous algae whereas there is either weak positive or negligible spatial correlation between ﬁsh species and macrovegetation and diatomous algae. Figure 5 shows the marginal eﬀects of the predictor functions for both univariate (SSDM) and multivariate (JSDM) GP models calculated as described in section 4.1. In general, the responses in JSDM and SSDM are similar. In most of the cases the uncertainty in response function is smaller (narrower credible regions) in JSDM than in SSDM although the diﬀerences are not large. Most of the responses are smooth but imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 18 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Figure 5: Marginal latent responses along covariates, h . Grey corresponds to single j,r species GP model (model 1) and black to multivariate additive GP model (model 3). show patterns that would be hard to capture with a quadratic function. For example, three and nine spine stickleback, macrovegetation and ﬁlamentous algae show logistic style response to either distance to deep or openness. Moreover, the response of vendace on ice break up week is ﬁrst constant but has very steep increase after week 16. The responses to ice break up or chlorophyll-a show non-linear and non-quadratic responses also for white ﬁsh, macrovegetation and ﬁlamentous algae. The MAP estimates of other hyper-parameters are summarized in Supplementary material. 6.3 Spatial predictions Figure 6 shows the distribution maps as posterior median of intensity and expected count of individuals in replicate sampling produced by SSDM (model 1) and JSDM (model 3) for vendace. The distribution maps for other species together with maps on posterior predictive variance are shown in Supplementary material. In broad scale the posterior median of the intensity looks similar with SSDM and JSDM whereas SSDM predicts larger species counts than JSDM for all species throughout the study region and the uncertainty related to SSDM predictions is larger than that of JSDM predictions. Both SSDM and JSDM predict that vendace is distributed mostly in the northern Gulf of Bothnia. However, SSDM predicts somewhat higher median intensity than JSDM also in the southern Gulf of Bothnia. In relation to median intensity similar pattern that imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 19 Figure 6: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for vendace predicted by SSDM (model 1) and JSDM (model 3). JSDM predicts more restricted distribution range than the single species model is seen also in the predictions concerning sticklebacks and whiteﬁsh. In case of diatomous and ﬁlamentous algae SSDM and JSDM predict the distribution pattern similarly whereas for macrovegetation JSDM predicts slightly larger distribution ranges. The posterior distributions of spatial lenght-scale parameters were concentrated near one kilometer or less (see table 5 in Supplementary material). Hence, the diﬀerences in distribution predictions cannot be explained by the spatial random eﬀects over large areas but the spatial random eﬀect explains local deviations from the predictions based on covariates. 7 Discussion and concluding remarks 7.1 Case study results The GP based SDMs had better predictive performance than the parametric SDMs and JSDMs had better predictive performance than the corresponding SSDMs. The diﬀerences between SSDM and JSDM are most apparent in the distribution maps and predictor functions. The JSDM predicted in general smaller distribution ranges than SSDMs (macrovegetation being the only exception) and the posterior uncertainty in their predictions were smaller than in SSDMs. When interpreting the results, it should be remembered that the sampling was tar- imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 20 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 geted to larvae of sea-spawning Coregonids (whiteﬁsh and vendace) and planned to cover their plausible distribution range in terms of spatial (coastal areas of Gulf of Bothnia) and temporal extent (spring). Other species were sampled alongside Coregonids and the sampling area covers only limited portion of their full distribution range in the spring. Moreover, the abundance and distribution of all the studied species vary annually. Fish change their distribution areas seasonally and their larval stages last only few weeks. Vegetation and algal cover vary due changes in temperature and ice eﬀects. For these reasons the results are most representative for larvae of whiteﬁsh and vendace. For other species the results describe their spring distribution in the shallow coastal regions only. Our results correspond rather well to the earlier knowledge on the studied species. The responses to the environmental covariates and the interspeciﬁc correlations (Fig- ures 5 and 4) indicate that whiteﬁsh and vendace larvae are, in general, distributed in diﬀerent areas than sticklebacks. Most of the literature on Baltic Sea sticklebacks focus on three-spined sticklebacks whereas the nine-spined stickleback is not well stud- ied. In early spring, stickleback abundances have been found to be highest in sheltered archipelago areas, where part of the population overwinter. High abundance of stick- lebacks are typically thought to indicate structural complexity on the bottom; such as the presence of stones and boulders as well as reeds and other macrovegetation that function as shelter and provide food (Peltonen et al., 2004; Sieben et al., 2011). On contrary, highest densities of whiteﬁsh and vendace larvae are observed in open sandy shores near deep areas and in structurally simple shores, without macrovegetation, boul- ders and stones (Hudd et al., 1988; Leskel¨ a et al., 1991; Veneranta et al., 2013). The distribution of vendace is highly inﬂuenced by ice break up week and salinity so that long ice cover period and low salinity increase their abundance. Ice winter is longer and salinity is lower in the northern than in the southern Gulf of Bothnia and thus it correlates positively to Coregonid presence (Vanhatalo et al., 2012). In general, optimal habitats for Coregonid larvae are mainly located in the northernmost Gulf of Bothnia. Sticklebacks are known to prey mainly on mesozooplankton, but also on grazers (Pel- tonen et al., 2004; Sieben et al., 2011). Sticklebacks feed also on ﬁsh larvae if available (Bystr¨ om et al., 2015), but there are no studies on potential predation risk to Corego- nids. Based on our results, in the scale of coastal region of Gulf of Bothnia, sticklebacks are not thread to Coregonid larvae since their high density areas do not overlap. More- over, there was no spatial correlation between sticklebacks and coregonids (Figure 4) which could indicate interspeciﬁc interaction of any kind. The presence of sticklebacks has been connected to higher eutrophication status and stickleback reproduction and growth beneﬁt from increasing temperature and eutrophication (Lef´ ebure et al., 2011; Candolin et al., 2008; Meier et al., 2012) On contrary, these environmental characteris- tics are assumed to aﬀect negatively the Coregonid reproduction (M.Cingi et al., 2010; Muller ¨ , 1992; Veneranta et al., 2013; Vanhatalo et al., 2012). In our results whiteﬁsh and vendace larvae have positive response to Chlorophyl-a, which is a strong indicator for increasing eutrophication. However, in the scale of Gulf of Bothnia Chlorophyl-a concentration is typically higher near river mouths where salinity is lower and nutrient inﬂow high. Hence the result more likely reﬂects the high river inﬂuence through low salinity than preference for eutrophicated water. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 21 The vegetation and algae distribution in our results reﬂect the nutrient status during winter and early spring as well as eﬀect of ice cover and ice scraping in wind exposed shallow areas. Both the SSDMs and JSDMs indicate that ﬁlamentous algae occur in all coastal areas in high densities, except in some sheltered inner coastal areas. Macroveg- etation and diatomous algae had highest presence probabilities at areas with lower ﬁlament presence probability. This pattern agrees well with the general understand- ing of these species groups. Filamentous algae are typically distributed in wind and wave exposed shores, that epiphytic diatoms and reeds that require shelter cannot tol- erate. Filamentous algae can, however, grow over macrovegetation and it is possible that macrovegetation distribution is underestimated if ﬁlamentous algae growth over macrovegetation has hided macrovegetation from the sampling pictures. The higher nu- trient levels in sheltered areas have been found to aﬀect positively reed belt growth, and high abundances of macrovegetation species have been found from archipelago areas, lagoons, bays and river inlets (Pitk¨ anen et al., 2013; Altartouri et al., 2014). The JSDM model reﬂects these smaller scale occurrences better than SSDM. In our results, diato- mous algae are more common in estuaries and northern parts of the study area (see Supplementary material). This is likely explained by longer ice winter towards northern Gulf of Bothnia and longer distance to deep water. 7.2 Multivariate additive Gaussian process Next we discuss some of the similarities and diﬀerences between our model and the existing JSDMs and highlight the novel methodological contributions in this paper. Interspeciﬁc correlations From the predictive point of view, the interspeciﬁc correlations in predictive functions are attractive for many reasons. In many applications of SDMs the aim is to pre- dict species distribution over regions that include locations spatially far from the data (Record et al., 2013) or to conduct scenario predictions related to, for example, climate change or land use (Guisan et al., 2013). In these applications, predictions are based on the responses to environmental covariates. The inclusion of interspeciﬁc dependence allows information ﬂow between species which improves the estimates for predictive functions especially for species with only scarce data (Thorson et al., 2015; Ovaskainen et al., 2017; Hui et al., 2013; Clark et al., 2014). We used the marginally uniform priors of (Barnard et al., 2000; Tokuda et al., 2012) for the interspeciﬁc correlations. This is justiﬁed by prior ignorance on correlations. Prior information on interspeciﬁc correlations could, however, be added into model with, e.g., informative inverse Wishart prior. With many species inferring the full covariance matrix is hard in which case we could use spatially dependent latent factors which induce for the 0 0 0 spatial random eﬀects a covariance structure Cov (s), (s ) = λ λ k (s, s ), j qj qj q q=1 where k is the q’th spatial covariance function and λ the corresponding species speciﬁc q qj factor loading. Here, M < J so that this covariance structure is a low rank represen- tation of LMC (Lopez, 2000; Lopez et al., 2008). Latent factor representation was ﬁrst introduced to SDMs by Thorson et al. (2015) and it is used also in HMSC Ovaskainen imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 22 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 et al. (2017). Recently Taylor-Rodr´ ıguez et al. (2017) proposed a clustering scheme where the species are clustered to less than J factor loading vectors. Interspeciﬁc correlation between response functions has received less interest. Typi- cally the response functions are assumed to be independent among species. Exception are species archetype models (Dunstan et al., 2013) and the HMSC framework of Ovaskainen et al. (2017). In the latter, the response functions are deﬁned as h (x ) = x θ where j j j j· T T T θ = [θ , . . . , θ ] is a J × p matrix of regression weights with hierarchical prior. The 1· J· species speciﬁc weights are given Gaussian prior θ ∼ N (μ , V ) where μ is the ex- j· j· j j· pected response of a species that can be common for all species, common within groups of species or modelled through species traits μ = τ γ where τ is a vector of trait jr r j values of species j and γ ∼ N (0, V ) are the eﬀects of traits to response along covariate r γ 2 2 r. With V = σ I and common prior mean μ = μ ∼ N (0, σ ) for all j ∈ 1, . . . , J , J×J j· the induced covariance between species speciﬁc additive response functions is 0 0 0 Cov (h (x ), h 0 (x )) =E[Cov (x θ , x θ 0 )] + Cov (E[x θ ],E[x θ 0 ]) j,r r j ,r r jr j r r jr j r r r r 2 0 2 0 =(σ + δ (j )σ )x x j r μ r 0 T 0 2 0 0 0 and with trait dependent prior mean Cov (h (x ), h (x )) = (τ V τ +δ (j )σ )x x . j,r r j ,r γ j j r r j r Hence, these alternatives have the same covariance structure as an additive multivari- ate GP (3.10) with k = x x for all j = 1, . . . , J , and interspeciﬁc covariances hj,r r r 2 0 2 T 0 2 Σ = σ + δ (j )σ and Σ = τ V τ 0 + δ (j )σ . Hence, the hierarchical prior struc- h j h γ j j r μ r j ture of HMSC could easily be extended to semiparametric models as well if we had trait information from our species. The beneﬁt would be that these hierarchical model structures contain ecologically relevant prior information and, hence, using the induced interspeciﬁc covariance structure in the additive multivariate GP model would make it ecologically more realistic and potentially improve its predictive performance. The HMSC framework has many other features as well, such as phylogenetic relationships Ovaskainen et al. (2017), which could in principle be incorporated with our approach as well. The structure of interspeciﬁc correlations is likely to be especially important in scenario based predictive analyses, such as climate change predictions. Since our model does not make any mechanistic assumptions on species interde- pendencies the results concerning interspeciﬁc correlations have to be interpreted with care. For example, a positive correlation between spatial random eﬀects could arize due mutualism, predator prey association or similar response to unobserved environmental covariates. Hence, interpreting these correlations should be done in light of more gen- eral ecological knowledge on the species. This is, however, a common challenge with all current JSDMs (Thorson et al., 2015; Dunstan et al., 2013; Ovaskainen et al., 2017; Taylor-Rodr´ ıguez et al., 2017). Moreover, the property of our model, as well as most other JSDM, is that if the training data shows positive or negative spatial correlation between two species this positive correlation is assumed to hold everywhere in the study domain. In many cases this might be an unrealistic assumption for which reason one in- teresting future development would be to extend our models to allow species-to-species associations to depend on the environmental context(Fox and Dunson, 2015; Tikhonov et al., 2017). However, the colocalization patterns are explained by both environmental imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 23 covariates and spatial random eﬀects and regional diﬀerences in colocalization patterns can be explained by diﬀerent environmental covariates in diﬀerent regions. The interspeciﬁc correlation in our approach are modelled on the latent variable level and, hence, are not directly comparable to interspeciﬁc correlations in data. Some authors have argued that this is a conceptual weakness of latent variable models since it makes interpretating their results hard and thus less transparent. An alternative would be to model the correlations in the observation error model (see, e.g., Ovaskainen et al., 2010; Pollock et al., 2014). Clark et al. (2016) proposed generalized joint attribute model (GJAM) to ﬁx this interpretation challenge by aiming to model the correlations between species on the data scale. However, for example, in case of presence absence observations GJAM corresponds to multivariate probit model that is the marginal likelihood of our model in case of Bernoulli observation model. It is also unclear how to disentangle the process underlying the species occurrence and abundance from the observation process leading to the actual data in GJAM approach. Hence, despite the challenge of inter- pretating the correlations, we prefer the hierarchical latent variable modeling since it provides a coherent approach to separate these two processes. Response functions The response functions along environmental covariates provide basis to understand how environment aﬀects species distribution and to predict the species distribution in new areas. Linear and other parametric models are eﬃcient in ﬁnding overall trends in re- sponses but have trouble in locating abrupt changes and change points due, for example, physical tolerance limits. Such limits, for example, along temperature and salinity are typical for large variety of taxa in aquatic domains (MacKenzie et al., 2007; Kotta et al., 2019). With SSDMs Vanhatalo et al. (2012), Shelton et al. (2014), Golding and Purse (2016) and Kallasvuo et al. (2017) have demonstrated the beneﬁts of semiparametric models in such situations. The GP approach for modeling environmental responses is similar to generalized additive models (Guisan et al., 2002) but the latter have not been implemented as JSDMs. The ﬂexibility of semiparametric models provides also challenges compared to the more restricted parametric models. The prior for covariance function parameters has to be set with care (Golding and Purse, 2016) and we may also need to pose monotonicity constraints to the response functions Kotta et al. (2019) in order to prevent overﬁtting. Inferring the responses reliably requires typically more data compared to parametric models. The multivariate additive GP helps in these challenges as well since the interspeciﬁc correlations increase the eﬀectively amount of data to be used to infer each response function and hence decreases the uncertainty in them. Additive multivariate GP framework opens also new questions related to the choice of covariance functions and interspeciﬁc correlations. A typical choice for general pur- pose covariance function is a radial basis function. However, predictions using these covariance functions revert to prior predictive distributions when predicting beyond covariate range covered by data. Hence, in extrapolation tasks stationary covariance functions may not be the optimal choice (Vanhatalo et al., 2012; Kallasvuo et al., 2017) and combining the multivariate GP models with functional constraints (e.g. Kotta et al., 2019) could improve their predictive performance further. imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 24 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Spatial random eﬀects Our model comparison results show clearly that including spatial random eﬀects into model improves their predictive performance in both inter- and extrapolation. This re- sult is well in line with earlier works demonstrating that environmental conditions alone may not suﬃciently explain species distribution and spatial random eﬀects improve their predictive performance (Latimer et al., 2006; Vanhatalo et al., 2012; Clark et al., 2014; Thorson et al., 2015; Kallasvuo et al., 2017). This is reasonable, since the distribution of species is shaped by the interplay of environmental covariates, stochastic processes and species interaction (Ovaskainen et al., 2017). Hence, a justiﬁed model should account for random processes and as demonstrated also by our results also to species interac- tions in these processes. However, adding spatial random eﬀects into model may lead to problems in model identiﬁbiality which we discuss in the next section. Posterior inference and predictive performance The JSDM built with multivariate additive GP had clearly the best overall predictive performance. Moreover, the uncertainty in the predictions by JSDMs were smaller than in the SSDMs which, together with the best log predictive density performance, indicates that the predictions were also more accurate. These ﬁndings have important implications to practical use of SDMs for management and other purposes. For example Kallasvuo et al. (2017) used SSDMs to classify coastal areas of Finland to unsuitable, suitable and highly productive spawning regions for four commercially important ﬁsh species. They based their estimates on the expected numbers of larvae in the coastal region which, as shown by Figure 6 is highly sensitive to the uncertainty of the predictions. Moreover, by using SSDMs we can overestimate the total biomass of all species (Clark et al., 2014). An inherent challenge with the type of models considered here is the model identiﬁb- iality. Spatial random eﬀects are known to aﬀect the ﬁxed eﬀects estimates (Hodges and Reich, 2010) and in some cases the spatial random eﬀect can actually capture variability otherwise explainable with ﬁxed eﬀects (Paciorek, 2010; Bose et al., 2018). Moreover, with uniform (uninformative) priors the length-scale and variance hyperparameters of Mat´ ern class of covariance functions are non-identiﬁable (Zhang, 2004) in general. In order to alleviate these identiﬁbiality challenges we used weakly informative priors for the variance and length-scale parameters deﬁned through the principles proposed by Gelman (2006), Simpson et al. (2017) and Fuglstad et al. (2018). Our priors for the covariance function parameters favor small variance and large length-scale parameter values. The former penalizes for large magnitudes and the latter penalizes for wiggly responses along covariates or space. In case of covariate responses this is ecologically justiﬁed since according to ecological niche theory species’ responses to environmental covariates are typically either monotonic or have only one mode (Ovaskainen et al., 2016). Moreover, Fuglstad et al. (2018) show that these type of priors work well for spa- tial random eﬀects with Mat´ ern covariance functions in general. Favoring longer spatial length-scales is justiﬁed also from the model identiﬁbiality point of view. Identiﬁability between ﬁxed and spatial random eﬀects is of less concern if spatial random eﬀect varies with larger spatial range than the environmental covariates (Paciorek, 2010, Section 3). imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 25 Since the environmental covariates in Finnish coastal region vary with relatively small spatial range, our weakly informative prior eﬀectively favors models that explain this variation with covariate responses instead of with spatial random eﬀects. Our model comparison suggests that the models with both covariate eﬀects and spatial random eﬀects lead to most reliable inference on environmental responses and spatial random eﬀects. A model’s extrapolation performance is governed mainly by the environmental covariates so better extrapolation performance indicates more reliable response func- tions. Similarly more accurate interpolation performance indicates more reliable joint eﬀect of environmental covariates and spatial random eﬀect. An evident challenge with multivariate additive GP is the computation. The core element in the multivariate additive GP is the covariance matrix induced by the GP components. With many species and sampling sites the covariance matrix can become so large that it makes the implementation infeasible. Here we used Laplace method to speed up the computation by decreasing the number of costly posterior density calculations compared to full MCMC. However, in order to scale up the methods for large data sets involving thousands of sampling sites, the model would need to be implemented even more eﬃciently. One option to reduce the computational time could be to exploit the property that linear LMC can be parameterized through species speciﬁc conditional distributions (Gelfand et al., 2004). We could also implement multivariate GPs with sparse GP approximations (Vanhatalo et al., 2010; Alvarez et al., 2010) and replace the full rank LMC model with latent factor model (Thorson et al., 2015; Ovaskainen et al., 2017). However, we leave these considerations for future. 7.3 Summary Our JSDM based on multivariate additive GP combines the key ideas from semi- parametric SSDMs and state-of-the-art parametric JSDMs. In our case study, it showed superior predictive performance compared to existing GP based SSDMs and paramet- ric SSDMs and JSDMs in both interpolation and extrapolation. Hence, the multivariate additive GP can be seen as the ﬁrst step towards integration of the semi-parametric SSDMs and hierarchical JSDMs. We propose also an eﬃcient approach for inference by utilizing Laplace approximation and gradient based optimization for hyperparame- ters. The multivariate additive GP model is not restricted only to species distribution modeling but can be used in wide variety of other applications as well. Supplementary Material The supplementary material contains additional mathematical formulation of the method- ology proposed in this paper and additional ﬁgures and tables for the case study analysis. (http://www.some-url-address.org/dowload/0000.zip). Acknowledgments This work has bee funded by the Academy of Finland (grant 317255) and Research Funds of the University of Helsinki (decision No. 465/51/2014). imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 26 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 References Altartouri, A., Nurminen, L., and Jolma, A. (2014). “Modeling the role of the close- range eﬀect and environmental variables in the occurrence and spread of Phragmites australis in four sites on the Finnish coast of the Gulf of Finland and the Archipelago Sea.” Ecology and Evolution , 4: 987–1005. 21 Alvarez, M. A., Luengo, D., Titsias, M. K., and Lawrence., N. D. (2010). “Eﬃcient multi- output Gaussian processes through variational inducing kernels.” JMLR Workshop and conference proceedings , 9: 25–32. 25 Banerjee, S., Carlin, B. P., and Gelfand, A. E. (2015). Hierarchical Modelling and Analysis for Spatial Data . Chapman Hall/CRC, second edition. 6 Barnard, J., McCulloch, R., and Meng, X.-L. (2000). “Modelling covariance matrices in terms of standard deviations and correlations, with applications to shrinkage.” Statistical Sinica . 9, 21 Bergstr¨ om, U., Olsson, J., Casini, M., Eriksson, B. K., Fredriksson, R., Wennhage, H., and Appelberg, M. (2015). “Stickleback increase in the Baltic Sea – A thorny issue for coastal predatory ﬁsh.” Estuarine, Coastal and Shelf Science , 163: 134–142. 3, 4 Bose, M., Hodges, J. S., and Banerjee, S. (2018). “Toward a diagnostic toolkit for linear models with Gaussian-process distributed random eﬀects.” Biometrics , 74(3): 863–873. 24 Busse, S. and Snoeijs, P. (2002). “Gradient responses of diatom communities in the Bothnian Bay, northern Baltic Sea.” Nova Hedwigia , 3-4: 501–525. 4 Bystr¨ om, P., Bergstr¨ om, U., Hj¨ alten, A., St˚ ahl, S., Jonsson, D., and Olsson, J. (2015). “Declining coastal piscivore populations in the Baltic Sea: Where and when do stick- lebacks matter?” Ambio, 44: 462–471. 20 Candolin, U., Engstr¨ omOst, J., and Salesto, T. (2008). “Humaninduced eutrophication enhances reproductive success through eﬀects on parenting ability in sticklebacks.” Oikos , 117: 459–465. 20 Chib, S. and Greenberg, E. (1998). “Analysis of multivariate probit models.” Biometrika , 85(2): 347–361. 2 Clark, J. S., Gelfand, A., Woodall, C. W., and Zhu, K. (2014). “More than the sum of the parts: forest climate response from joint species distribution models.” Ecological Applications , 24(5): 990–999. 2, 21, 24 Clark, J. S., Nemergut, D., Seyednasrollah, B., Turner, P. J., and Zhang, S. (2016). “Generalized joint attribute modeling for biodiversity analysis: median-zero, multi- variate, multifarious data.” Ecological Monographs , 87(1): 34–56. 23 Cressie, N. and Wikle, C. K. (2011). Statistics for Spatial-Temporal Data . Wiley Series in Probability and Statistics. 6 Dunstan, P. K., Foster, S. D., Hui, F. K. C., and Warton, D. I. (2013). “Finite Mixture of Regression Modeling for High-Dimensional Count and Biomass Data in Ecology.” imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 27 Journal of Agricultural, Biological, and Environmental Statistics , 18(3): 357–375. 2, Duvenaud, D., Nickisch, H., and Rasmussen, C. (2011). “Additive Gaussian Processes.” Neural Information Processing Systems . 7 Elith, J. and Leathwich, J. R. (2009). “Species Distributions Models: Ecological Ex- planation and Predictions Across Space and Time.” The Annual Review of Ecology, Evolution and Systematics , 40(677-697). 2 Eriksson, B. K., Ljunggren, L., Sandstrom, A., Johansson, G., Mattila, J., Rubach, A. E., R˚ aberg, S., and Snickars, M. (2009). “Declines in predatory ﬁsh promote bloomforming macroalgae.” Ecological Applications , 19: 1975e1988. 4 Eriksson, B. K., Rubach, A., Batsleer, J., and Hillebrand, H. (2012). “Cascading preda- tor control interacts with productivity to determine the trophic level of biomass ac- cumulation in a benthic food web.” Ecological Research , 27: 203e210. 4 Fox, E. B. and Dunson, D. B. (2015). “Bayesian Nonparametric Covariance Regression.” Journal of Machine Learning research , 16(1): 2501–2542. 22 Fuglstad, G.-A., Simpson, D., Lindgren, F., and Rue, H. (2018). “Constructing priors that penalize the complexity of Gaussian random ﬁelds.” Journal of the American Statistical Association , 1(1): 1–8. 24 Gelfand, A. E., Jr, J. A. S., Wu, S., Latimer, A., Lewis, P. O., Rebelo, A. G., and Holder, M. (2006). “Explaining Species Distribution Patterns through Hierarchical Modelling.” Bayesian Analysis , 1(1): 41–92. 2, 6 Gelfand, A. E., Schmidt, A. M., Banerjee, S., and Sirmans, C. F. (2004). “Nonstationary multivariate process modeling through spatially varying coregionalization.” Test , 13(2): 263–312. 8, 10, 25 Gelman, A. (2006). “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper).” Bayesian Analysis , 1(3): 515–534. 24 Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian Data Analysis . Chapman & Hall/CRC, third edition. 17 Golding, N. and Purse, B. V. (2016). “Fast and ﬂexible Bayesian species distribution modelling using Gaussian processes.” Methods in Ecology and Evolution , 7: 598–608. 2, 3, 7, 8, 23 Guisan, A., Edwards, T. C., and Hastie, T. (2002). “Generalized linear and generalized additive models in studies of species distributions: setting the scene.” Ecological Modelling , 157(2-3): 89–100. 23 Guisan, A., Tingley, R., Baumgartner, J. B., Naujokaitis-Lewis, I., Sutcliﬀe, P. R., Tulloch, A. I. T., Regan, T. J., Brotons, L., McDonald-Madden, E., Mantyka-Pringle, C., Martin, T. G., Rhodes, J. R., Maggini, R., Setterﬁeld, S. A., Elith, J., Schwartz, M. W., Wintle, B. A., Broennimann, O., Austin, M., Ferrier, S., Kearney, M. R., Possingham, H. P., and Buckley, Y. M. (2013). “Predicting species distributions for conservation decisions.” Ecology Letters , 16(12): 1424–1435. 21 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 28 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Hartmann, M., Hosack, G. R., Hillary, R. M., and Vanhatalo, J. (2017). “Gaussian process framework for temporal dependence and discrepancy functions in Ricker-type population growth models.” Ann. Appl. Statist., 11(3): 1375–1402. 2, 9 Hastie, T. and Tibishirani, R. (1986). “Generalize additive models.” Statistical Science , 1(3): 297–318. 7 Hodges, J. S. and Reich, B. J. (2010). “Adding Spatially-Correlated Errors Can Mess Up the Fixed Eﬀect You Love.” The American Statistician , 64(4): 325–334. 24 Hudd, R., Lehtonen, H., and Kurttila, I. (1988). “Growth and abundance of fry; factors which inﬂuence the year-class strength of whiteﬁsh (Coregonus lavaretus widegreni ) in the southern Bothnian Bay (Baltic).” Finnish Fisheries Research , 9: 213–220. 20 Hui, F. K. C., Warton, D. I., Foster, S. D., and Dunstan, P. K. (2013). “To mix or not to mix: Comparing the predictive performance of mixture models vs. separate species distribution models.” Ecology , 94(9): 1913–1919. 2, 21 Kallasvuo, M., Vanhatalo, J., and Veneranta, L. (2017). “Modeling the spatial dis- tribution of larval ﬁsh abundance provides essential information for management.” Canadian Journal of Fisheries and Aquatic Sciences , 74: 636–649. 2, 3, 7, 23, 24 Kass, R. E. and Raftery, A. E. (1995). “Bayes Factors.” Journal of the American Statistical Association , 90(430): 773–795. 16 Kotta, J., Vanhatalo, J., J¨ anes, H., Orav-Kotta, H., Rugiu, L., Jormalainen, V., Bobsien, I., Viitasalo, M., Virtanen, E., Sandman, A. N., Isaeus, M., Leidenberger, S., Jonsson, P., and Johannesson, K. (2019). “Integrating experimental and distribution data to predict future species patterns.” Scientiﬁc Reports , 9(1): 1821. 2, 23 Kurowicka, D. and Cooke, R. (2003). “A parameterization of positive deﬁnite matrices in terms of partial correlation vines.” Linear Algebra and its Applications , 372(Sup- plement C): 225–251. 12 Latimer, A. M., Wu, S., Gelfand, A. E., and Silander, Jr., J. A. (2006). “Building Statistical Models to Analyze Species Distributions.” Ecological Applications , 16(1): 33–50. 2, 24 Lef´ ebure, R., Larsson, S., and Bystr¨ om, P. (2011). “A temperature dependent growth model for the threespined stickleback Gasterosteus aculeatus.” Journal of ﬁsh biology , 79: 1815–1827. 20 Leskel¨ a, A., Hudd, R., Lehtonen, H., Huhmarniemi, A., and Sandstr¨ om, O. (1991). “Habitats of whiteﬁsh (Coregonus lavaretus (L.) s.l.) larvae in the Gulf of Bothnia.” Aqua Fennica , 21: 145–151. 20 Lewandowski, D., Kurowicka, D., and Joe, H. (2009). “Generating random correla- tion matrices based on vines and extended onion method.” Journal of Multivariate Analysis , 100(9): 1989–2001. 12 Lindley, D. V. (2002). “Seeing and Doing: The Concept of Causation.” International Statistical Review , 70(2): 191–214. 11 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 29 Liu, J. and Vanhatalo, J. (2018). “Bayesian model-based spatiotemporal survey design for log-Gaussian Cox process.” arXiv e-prints , arXiv:1808.09200. 7 Lopez, H. F. (2000). “Bayesian Analysis in Latent Factor and Longitudinal Models.” Ph.D. thesis, Institute of Statistics and Decision Sciences, Duke University. 21 Lopez, H. F., Salazar, E., and Gamerman, D. (2008). “Spatial Dynamic Factor Analy- sis.” Bayesian Statistics , 3(4): 759–792. 21 MacKenzie, B. R., Gislason, H., M¨ ollmann, C., and K¨ oster, F. W. (2007). “Impact of 21st century climate change on the Baltic Sea ﬁsh community and ﬁsheries.” Global Change Biology , 13: 1348–1367. 23 Mardia, K. V. and Goodall, C. R. (1993). “Spatio-Temporal analysis of Multivariate Environmental Monitoring Data.” Multivariate Environmental Statistics , 347–386. 8, 10 M.Cingi, S., Kein¨ anen, and Vuorinen, P. J. (2010). “Elevated water temperature impairs fertilization and embryonic development of whiteﬁsh Coregonus lavaretus.” Journal of Fish Biology , 76: 502–521. 20 Meier, H. E. M., Hordoir, R., Andersson, H. C., Dieterich, C., Eilola, K., ..., B. G. G., and Schimanke, S. (2012). “Modeling the combined impact of changing climate and changing nutrient loads on the Baltic Sea environment in an ensemble of transient simulations for 19612099.” Climate Dynamics , 39: 2421–2441. 20 Muller, ¨ R. (1992). “Trophic state and its implications for natural reproduction of salmonid ﬁsh.” In Dynamics and Use of Lacustrine Ecosystems . Springer, Dordrecht. Nelder, J. A. and Wedderburn, R. W. M. (1972). “Generalized Linear Models.” Journal of the Royal Statistical Association , 135(3): 370–384. 2 Nickisch, H. and Rasmussen, C. E. (2008). “Approximations for binary Gaussian process classiﬁcation.” Journal of Machine learning , 9: 2035–2078. 9 Odum, E. P. (1953). Fundamentals of ecology . Wiley Subscription Services, Inc., A Wiley Company. 13 O’Hagan, A. (1978). “Curve Fitting and Optimal Design for Prediction.” Journal of Royal Statistical Society B , 40(1): 1–42. 2 Ovaskainen, O., de Knegt, H. J., and del Mar Delgado, M. (2016). Quantitative Ecology and Evolutionary Biology -Integrating models with data . Oxford University Press. 24 Ovaskainen, O., Hottola, J., and Siitonen, J. (2010). “Modeling species co-occurence by multivariate logistic regression generates new hypotheses on fungal interaction.” Ecology , 91(9): 2514–2521. 23 Ovaskainen, O. and Soininen, J. (2011). “Making more out of sparse data: Hierarchical modelling of species communities.” Ecology , 92(2): 289–295. 2 Ovaskainen, O., Tikhonov, G., Norberg, A., Guillaume Blanchet, F., Duan, L., Dunson, D., Roslin, T., and Abrego, N. (2017). “How to make more out of community data? imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 30 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 A conceptual framework and its implementation as models and software.” Ecology Letters , 20(5): 561–576. 2, 8, 21, 22, 24, 25 Paciorek, C. J. (2010). “The Importance of Scale for Spatial-Confounding Bias and Precision of Spatial Regression Estimators.” Statistical Science , 25(1): 107–125. 24 Pearl, J. (2009). “Causal inference in statistics: An overview.” Statistics Surveys , 3: 96–146. 11 Peltonen, H., Vinni, M., Lappalainen, A., and Pnni, J. (2004). “Spatial feeding patterns of herring (Clupea harengus L.), sprat (Sprattus sprattus L.), and the three-spined stickleback (emphGasterosteus aculeatus L.) in the Gulf of Finland, Baltic Sea.” ICES Journal of Marine Science , 61: 966–971. 20 Pitk¨ anen, H., Peuraniemi, M., Westerbom, M., Kilpi, M., and von Numers, M. (2013). “Longterm changes in distribution and frequency of aquatic vascular plants and charo- phytes in an estuary in the Baltic Sea.” Annales Botanica Fennica , 50: 1e54. 21 Pollock, L. J., Tingley, R., Morris, W. K., Golding, N., Hara, R. B. O., Parris, K. M., Vesk, P. A., and McCarthy, M. A. (2014). “Understanding co-occurence by modelling species simultaneously with joint species distribution model (JSDM).” Methods in Ecology and Evolution , 5: 397–406. 2, 8, 23 Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning . The MIT Press. 2, 7, 12 Record, S., Fitzpatrick, M. C., Finley, A. O., Veloz, S., and Ellison, A. M. (2013). “Should species distribution models account for spatial autocorrelation? A test of model projections across eight millennia of climate change.” Global Ecology and Biogeography , 22(6): 760–771. 21 Reusch, T. B. H., Dierking, J., Andersson, H. C., Bonsdorﬀ, E., Carstensen, J., Casini, M., Czajkowski, M., Hasler, B., Hinsby, K., Hyyti¨ ainen, K., Johannesson, K., Jomaa, S., Jormalainen, V., Kuosa, H., Kurland, S., Laikre, L., MacKenzie, B. R., Margon- ski, P., Melzner, F., Oesterwind, D., Ojaveer, H., Refsgaard, J. C., Sandstr¨ om, A., Schwarz, G., Tonderski, K., Winder, M., and Zandersen, M. (2018). “The Baltic Sea as a time machine for the future coastal ocean.” Science Advances , 4(5). 3 R¨ onnberg, C. and Bonsdorﬀ, E. (2004). “Baltic Sea eutrophication: area-speciﬁc eco- logical consequences.” Hydrobiologia , 514: 227–241. 4 Rue, H. and Marino, S. (2009). “Approximate Bayesian Inference for latent Gaussian models by using integrated Laplace approximantions.” Journal of the Royal Statistical Society , 71(2): 319–392. 9 Shelton, A. O., Thorson, J. T., Ward, E. J., and Feist, B. E. (2014). “Spatial semipara- metric models improve estimates of species abundance and distribution.” Canadian Journal of Fisheries and Aquatic Sciences , 71(July): 1655–1666. 23 Sieben, K., Ljunggren, L., Bergstr¨ om, U., and Eriksson, B. K. (2011). “A meso-predator release of stickleback promotes recruitment of macroalgae in the Baltic Sea.” Journal of Experimental Marine Biology and Ecology , 397: 79e84. 20 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 31 Simpson, D. P., Rue, H. v., Riebler, A., Martins, T. G., and Sørbye, S. H. (2017). “Penal- ising model component complexity: A principled, practical approach to constructing priors.” Statistical Science , 32(1): 1–28. 24 Taylor-Rodr´ ıguez, D., Kaufeld, K., Schliep, E. M., Clark, J. S., and Gelfand, A. E. (2017). “Joint Species Distribution Modeling: Dimension Reduction Using Dirichlet Processes.” Bayesian Analysis , 12(4): 939–967. 22 Thorson, J. T., Scheuerell, M. D., Shelton, A. O., See, K. E., Skaug, H. J., and Kris- tensen, K. (2015). “Spatial factor analysis: a new tool for estimating joint species distributions and correlations in species range.” Methods in Ecology and Evolution , 6(6): 627–637. 2, 21, 22, 24, 25 Tikhonov, G., Abrego, N., Dunson, D., and Ovaskainen, O. (2017). “Using joint species distribution models for evaluating how species-to-species associations depend on the environmental context.” Methods in Ecology and Evolution , 8(4): 443–452. 22 Tokuda, T., Goodrich, B., Mechelen, I. V., and Gelman, A. (2012). “Visualizing Distri- butions of Covariance Matrices.” 9, 21 Vanhatalo, J., Pietil¨ ainen, V., and Vehtari, A. (2010). “Approximate inference for disease mapping with sparse Gaussian processes.” Statistics in Medicine , 29(15): 1580–1607. 3, 9, 12, 25 Vanhatalo, J., Riihim¨ aki, J., Hartikainen, J., Jyl¨ anki, P., Tolvanen, V., and Vehtari, A. (2013). “GPstuﬀ : Bayesian Modeling with Gaussian Processes.” Journal of Machine Learning Research , 14: 1175–1179. 12 Vanhatalo, J., Veneranta, L., and Hudd, R. (2012). “Species distribution modelling with Gaussian processes: A case study with youngest stages of sea spawning whiteﬁsh (Coregonus lavatus L. s.l.) larvae.” Ecological Modelling , (228): 49–58. 2, 5, 7, 8, 16, 20, 23, 24 Vehtari, A., Mononen, T., Tolvanen, V., Sivula, T., and Winther, O. (2016). “Bayesian Leave-One-Out Cross-Validation Approximations for Gaussian Latent Variable Mod- els.” Journal of Machine Learning Research , 17: 1–38. 12, 15 Vehtari, A. and Ojanen, J. (2012). “A survey of Bayesian predictive methods for model assessment, selection and comparison.” Statistics Surveys , 6: 141–228. 15 Veneranta, L., Hudd, R., and Vanhatalo, J. (2013). “Reproduction areas of sea-spawning coregionids reﬂect the environment in shallow coastal areas.” Marine Ecology Progress Series , 477: 231–250. 3, 4, 5, 16, 20 Warton, D. I., Blanchet, F. G., O’Hara, R. B., Ovaskainen, O., Taskinen, S., Walker, S. C., and Hui, F. K. (2015). “So Many Variables: Joint Modeling in Community Ecology.” Trends in Ecology and Evolution , 30(12): 766–779. 2 Warton, D. I. and Shepherd, L. C. (2010). “Poisson point process models solve the ”pseudo-absence problem” for presence-only data in ecology.” Annals of Applied Statistics , 4(3): 1383–1402. 6 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 32 To appear in Bayesian Analysis, DOI: 10.1214/19-BA1158 Wikle, C. K. (2003). “Hierarchical Models in Environmental Science.” International Statistical Review , 71(2): 181–199. 6 Zhang, H. (2004). “Inconsistent Estimation and Asymptotically Equal Interpolations in Model-Based Geostatistics.” Journal of the American Statistical Association , 99(465): 250–261. 24 imsart-ba ver. 2014/10/16 file: JSDMfinal-ArXiv.tex date: August 9, 2019 Bayesian Analysis (2018) , Number , pp. 1 Supplementary Material: Additive multivariate Gaussian process for joint species distribution modeling with heterogeneous data ∗ † ‡ Jarno Vanhatalo , Marcelo Hartmann and Lari Veneranta 1 The posterior expectation and variance for new observations First we approximate the logistic function with p(f ) ≈ p ˜(f ) = aΦ(f/v )+(1−a)Φ(f/v ) 1 2 where a ∈ (0, 1) and v , v ∈ (0,∞) are parameters of the approximation (see Demi- 1 2 −4 denko, 2004). This approximation is used since it has small error bound (≤ |10 |, Demidenko (2004)) and can considerably speed-up marginalization over f for large datasets. Using Lemma 2 in the Section 4, we obtain, μ μ j,∗ j,∗ √ √ E[Y (x , s )| y,η, Λ] = z aΦ + z (1− a)Φ (1.1) j ∗ ∗ j,∗ j,∗ 2 2 Σ +v Σ +v j,∗ j,∗ 1 2 V[Y (x , s )| y,η, Λ] = E[Y (x , s )| y,η, Λ]− E[Y (x , s )| y,η, Λ] j ∗ ∗ j ∗ ∗ j ∗ ∗ 2 2 + (z − z )E[p ˜ (f )] j,∗ ∗ j,∗ where, Φ(·) is the cumulative distribution of the standard Gaussian random variable and 2 2 2 E[p ˜ (f )] = a F (μ |V ) + (1− a) F (μ |V ) + 2a(1− a)F (μ |V ) (1.2) ∗ 2 1,1 2 2,2 2 1,2 j,∗ j,∗ j,∗ F (μ |V ) is the zero-mean 2-dimensional Gaussian cumulative distribution function 2 m,n j,∗ evaluated at μ = [μ μ ] with covariance matrix given by j,∗ j,∗ j,∗ Σ + v Σ j,∗ j,∗ V = . (1.3) m,n Σ Σ + v j,∗ j,∗ In the case of Negative-Binomial model, we use the moment generating function of a Gaussian random variable to obtain the unconditional expectation of a future outcome w.r.t to latent variable, i.e. μ +Σ /2 j,∗ j,∗ E[Y (x , s )| y,η, Λ] = z e (1.4) j ∗ ∗ j,∗ h i μ +Σ /2 2 2μ +Σ Σ /2 j,∗ j,∗ j,∗ j,∗ j,∗ V[Y (x , s )| y,η, Λ] = z e + z e e (r ˆ + 1)/r ˆ − 1 . j ∗ ∗ j,∗ j j j,∗ Department of Mathematics and Statistics and Organismal and Evolutionary Biology Research Program, University of Helsinki, jarno.vanhatalo@helsinki.ﬁ Department of Mathematics and Statistics, University of Helsinki, marcelo.hartmann@helsinki.ﬁ Natural Resources Institute Finland, Finland c 2018 International Society for Bayesian Analysis imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 arXiv:1809.02432v2 [stat.ME] 8 Aug 2019 2 Supplement: JSDMs with additive multivariate Gaussian processes 2 Conditional predictions Recall that, in the main paper, the distribution of Y (x , s )| y conditioned on J ∗ ∗ 1 J hyperparameters (they were ommited from the notation for simplicity) is given by π(y | y ) = π(y | y , f )π(y , f )df /π(y ) (2.1) J ,∗ J ,∗ J ,∗ J ,∗ J J ,∗ J 1 J 1 1 J 1 2 1 2 2 2 The second term in the numerator of the integrand of (2.1) is the same as, π(y , f ) = π(f | f , y )π(f | y )π(y )df . (2.2) J ,∗ J ,∗ J J J J 1 1 2 J 2 J J 2 2 2 2 2 Now, f | f , y also only depends on f . Plugging (2.2) into (2.1) we get J ,∗ J J 1 2 J 2 π(y | y ) = π(y | f )π(f | f )π(f | y )df df (2.3) J ,∗ J ,∗ J J J J ,∗ J ,∗ J J ,∗ 1 1 2 2 J 2 1 1 2 1 2 where last two terms in (2.3) are Gaussians. The ﬁrst one is the conditional Gaus- sian density function w.r.t. the predictor function values f , that is, π(f | f ) = J J ,∗ J 2 1 2 −1 −1 T N (f |C C f , C − C C C ) and the second one is the J ,∗ J ,J ∗ J J ∗,∗ J ,J ∗ 1 1 2 2 1 1 2 J ,J J ,J J ,J ∗ 2 2 2 2 1 2 Laplace approximation using the scenario species data related to the set J , that is, −1 −1 π(f | y ) ≈ N f |C ∇ log π(y |f ), (C + W ) . Plugging this ap- J J J ,J J J 2 J 2 2 2 J 2 2 2 2 J ,J 2 2 proximate density function in the equation (2.3) gives π(y | y ) = π(y | f )N (f |μ , Σ )df (2.4) J ,∗ J J ,∗ J ,∗ J ,∗ |J |J J ,∗ 1 2 1 1 1 2 2 1 with μ = C ∇ log π(y |f ) J ,J ∗ J |J 1 2 J 2 2 2 −1 −1 T Σ = C − C (C + W ) C (2.5) |J J ∗,∗ J ,J ∗ J ,J 2 1 1 2 2 2 J J ,J ∗ 2 1 2 where C is the covariance between species in the setJ at the point (x , s ) and the J ,J ∗ 1 ∗ ∗ 1 2 species in the setJ in their respective covariates and spatial locations. ∇ log π(y |f ) 2 J J 2 correspond to the gradient of the log-likelihood function for species related to the set J . C is the covariance matrix of f at (x , s ). C is the covariance matrix of J ∗,∗ J ,∗ ∗ ∗ J ,J 1 1 2 2 f and W is the negative Hessian matrix of the negative logarithm of the likelihood J J 2 2 functions related to species in the set J . Now, to calculate the predictive mean vector and predictive covariance matrix we follow the steps used in deriving unconditional expectation and unconditional variances as in the main paper. However we exclude them for brevity. 3 Unconstrained parametrization of covariance matrices Let the matrix Σ be a covariance matrix of dimension J . In terms of variances and cor- 1 1 2 2 2 2 2 2 2 relations, we rewrite Σ = diag(σ , . . . , σ ) P diag(σ , . . . , σ ) , where σ , j = 1, . . . , J 1 J 1 J j imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 3 are variances and these are transformed as σ = exp(δ ). The correlation matrix P is written in terms of its Cholesky decomposition, that is P = LL , where L is the lower triangular Cholesky decomposition. In our case, the upper triangular Cholesk decomposition is given by 1 z z ··· z 1,2 1,3 1,J Q Q Q 1 1 1 1 1 1 2 2 2 2 2 2 0 (1− z ) z (1− z ) ··· z (1− z ) 2,3 2,J i,2 i,3 i=1 i=1 i=1 i,J L = (3.1) . . . . . . . . . . . . . J−1 0 0 0 ··· (1− z ) i=1 i,J 0 0 where each z ∈ (−1, 1). Now, since each z can freely vary in the interval (−1, 1) i,i i,i without violate the positive deﬁniteness property of P (see Kurowicka and Cooke, 0 0 2003; Lewandowski et al., 2009), we map each z to the real line as z = 2/[1 + i,i i,i exp(−aδ 0 )]− 1, where δ 0 ∈ R. i,i i,i The derivative of the log π(P ) (recall the prior for the correlation matrix in the main paper) w.r.t. each δ 0 is given by i,i J−1 J X X ∂ ∂ ∂ log π(P|v) = log π(P|v) ρ 0 (3.2) j,j ∂δ 0 ∂ρ 0 ∂δ 0 i,i j,j i,i j=1 j =j+1 and the partial derivatives of log π(P|v) w.r.t to ρ 0 are obtained as, j,j −1 −1 0 0 log π(P|v) = (v − 1)(J − 1)− 1 {P } − v {P } (3.3) j,j c,c rr ∂ρ j,j r=1 : r∈{ / j,j } 0 0 0 0 0 where c = j − 1 and c = j if r < j and, j = 1 or r > j . If r < j and r < j then −1 −1 −1 0 0 0 c = j− 1 and c = j − 1. {P } 0 is the entry (j, j ) of the matrix P and {P } 0 is j,j c,c rr 0 −1 the entry (c, c ) of the inverse matrix P , where P is a matrix obtained by removing rr rr the r’th row and column of P . The partial derivatives of each ρ 0 w.r.t to δ 0 are j,j i,i T T 0 0 0 obtained from ∂P/∂δ = (∂L/∂δ )L + L(∂L/∂δ ) . i,i i,i i,i Lastly, we ﬁnd the derivatives of the logarithm of the absolute value of the Jacobian determinant w.r.t each δ (see equation (11) in Lewandowski et al. 2009). i,i J−1 J Y Y ∂ ∂(ρ , . . . , ρ ) ∂ dz 0 1,2 J−1,J j,j 2 (J−j−1)/2 log = log (1− z 0 ) j,j 0 0 0 ∂δ ∂(δ , . . . , δ ) ∂δ dδ i,i 1,2 J−1,J i,i j,j j=1 j =j+1 2 2 d z 0/dδ 0 0 0 z dz i,i i,i i,i i,i = −(J − i− 1) + (3.4) 1− z dδ 0 dz 0/dδ 0 i,i i,i i,i i,i imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 4 Supplement: JSDMs with additive multivariate Gaussian processes 4 Gaussian integrals Lemma 1. Let’s denote by Φ(·) the standard-Gaussian cumulative distribution function 2 2 and N (·|μ, σ ) the Gaussian density function with parameters (μ, σ ) ∈ R× R . Then the following holds x− m N (x|μ, σ ) Φ dx = F (μ |m , V ) (4.1) N N N r=1 −∞ where F (·|c,C) is the N -dimensional Gaussian cumulative distribution function with parameters (c,C) ∈ R ×R, with R the space of positive-deﬁnite matrices (covariance T N N matrices). Furthermore, m = [m ··· m ] ∈ R , μ = μ1 ∈ R , v > 0 ∀r and N 1 N N r V is a covariance matrix given by, 2 2 2 v + σ . . . σ . . . . . V = (4.2) N . . 2 2 2 σ . . . v + σ Proof. To show (4.1), start writing the left-hand side of the equation in full. Rewrite the integrand as the product of non-standard Gaussian density functions as well as the regions of integration, i.e., ∞ x x Z Z Z 2 2 2 ··· N (y |m , v ) . . .N (y |m , v )N (x|μ, σ )dy ··· dy dx. (4.3) 1 1 N N 1 N 1 N −∞−∞ −∞ Rewrite again using the following transformation [x, y ,··· , y ] = [w + μ, z + w + 1 N 1 m ,··· , z + w + m ] and note that |∂(x, y ,··· , y )/∂(w, z ,··· , z )| = 1. After 1 N N 1 N 1 N changing variables, group the diﬀerent terms in the exponentials together to have ( " #) μ−m μ−m ∞ N 1 Z Z Z 1 2 (z +w) 1 r w ··· exp − + dz . . . dz dw (4.4) 2 2 1 N c v σ r=1 −∞ −∞ −∞ (N +1)/2 where c = σ(2π) v . Now, the expression inside the squared bracket is a r=1 quadratic form which is written with the following matrix form, N N N N X X X X (z +w) r w 2 1 1 z w z r r + = w + + w + z + 2 2 2 2 2 r 2 2 v σ v σ v v v r r r r r r=1 r=1 r=1 r=1 P P N N z 1 1 r w + + 2 2 2 r=1 v σ r=1 v r r w w z 1 z 1 2 2 v v = 1 1 . . w z N 2 2 v v N N imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 5 P N 1 1 1 1 + ··· 2 2 2 2 w r=1 v σ w v v 1 N 1 1 ··· 0 z z 1 2 2 1 v v 1 1 = (4.5) . . . . . . . . . . . . . . . . . . 1 1 z z N 0 ··· N 2 2 v v N N therefore (4.4) is the same as N 1 1 1 1 + ··· 2 2 2 2 w r=1 v σ w v v 1 N μ−m μ−m ∞ N 1 Z Z Z 1 1 ··· 0 z z 1 2 2 1 1 v v 1 1 1 ··· exp − dz dw . . . . . c . . . 2 . . . . . . . . . . −∞ −∞ −∞ 1 1 z z N 0 ··· N 2 2 v v N N (4.6) The integrand in (4.6) has the full form of the multivariate Gaussian density. To identify this we need to ﬁnd the closed-form covariance matrix from the precision matrix in (4.6) 2 N +1 and show that the determinant of the covariance matrix is given by c /(2π) . Write h i 1 1 1 1 the precision matrix as block matrix such that A = + , B = ··· , 2 2 2 2 r=1 v σ v v r 1 N 1 1 C = B and D = diag ,··· , . Use the partitioned matrix inversion lemma 2 2 v v 1 N −1 −1 2 (Strang and Borre, 1997, equation 17.44) to get the blocks, (A − BD C ) = σ , −1 −1 −1 2 −1 −1 −1 2 T −1 (BD C−A) BD = −σ [1··· 1], D C (BD C−A) = −σ [1··· 1] and D + −1 −1 −1 −1 2 2 2 2 D C (A−BD C ) BD where its main diagonal equals to [v +σ ,··· , v +σ ] and 1 N all oﬀ-diagonal elements are given by σ . Put everything together to have the covariance matrix 2 2 2 σ −σ ··· −σ 2 2 2 2 −σ v + σ ··· σ (4.7) . . . . . −σ . . 2 2 2 2 −σ σ ··· v + σ −1 2 2 2 N +1 whose determinant equals to 1/[det(D) det(A− BD C )] = σ v = c /(2π) r=1 r by the partitioned matrix determinant lemma (e.g., Rasmussen and Williams, 2006). Finally, in (4.6), interchange the order of integration with Fubini-Tonelli theorem (Fol- land, 2013) and integrate w.r.t. w to get 2 2 2 μ−m μ−m z 0 v + σ ··· σ N 1 Z Z 1 . . . . . . . . . ··· N , dz ··· dz (4.8) 1 N . . . . 2 2 2 −∞ −∞ z 0 σ ··· v + σ that equals to 2 2 2 μ m v + σ ··· σ . . . . . . . . . F , (4.9) . . . . 2 2 2 μ m σ ··· v + σ and therefore the equality (4.1) holds. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 6 Supplement: JSDMs with additive multivariate Gaussian processes Lemma 2. Let N (·|μ , Σ) be the N -dimensional Gaussian density function with mean parameter μ and covariance matrix Σ. Then the following holds true, F (x|m , V)N (x|μ , Σ)dx = F (μ |m , V + Σ) (4.10) N N N N N N where F (·|c,C) is the N -dimensional Gaussian cumulative distribution function with T N T N parameters (c,C). Furthermore, m = [m ··· m ] ∈ R , μ = [μ ··· μ ] ∈ R , N 1 N 1 N V and Σ are covariance matrices. Proof. Let’s rewrite the left-hand side of (4.10) in full, x x n 1 Z Z Z ··· N (y|m, V)N (x|μ , Σ)dydx (4.11) −∞ −∞ T T where y = [y ,··· , y ] . Let’s use the following transformation [x ,··· , x , y ,··· , y ] 1 n 1 N 1 N = [w + μ ,··· , w + μ , z + w + m ,··· , z + w + m ] . The Jacobian of this 1 1 N N 1 1 1 N N N transformation simpliﬁes to |∂(x ,··· , x , y ,··· , y )/ ∂(w ,··· , w , z ,··· , z )| = 1 N 1 N 1 N 1 N 1. Rewrite the above equation to get, μ −m μ −m N N 1 1 Z Z Z ··· N (w|− z, V)N (w| 0, Σ)dzdw (4.12) −∞ −∞ T T where z = [z ··· z ] and w = [w ··· w ] . Note that the product of two multivariate 1 N 1 N Gaussian density functions is another unnormalized multivariate Gaussian (see, e.g., Rasmussen and Williams, 2006, Appendix A). Therefore we write μ −m μ −m N N 1 1 Z Z Z ··· N (z| 0, V + Σ)N (w|c, C )dzdw (4.13) −∞ −∞ −1 −1 −1 −1 where c = −C V z and C = (V + Σ ) . Interchange the order of integration with Fubini-Tonelli theorem (Folland, 2013) and integrate w.r.t w to get that, μ −m μ −m N N 1 1 Z Z ··· N (z| 0, V + Σ)dz = F (μ |m , V + Σ) (4.14) N N −∞ −∞ which completes the proof. The above closed-form integrals extend many equalities in the table of Owen (1980). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 7 References Demidenko, E. (2004). Mixed Models: Theory and Application . John Wiley & Sons. Folland, G. (2013). Real Analysis: Modern Techniques and Their Applications . Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and Tracts. Wiley. Kurowicka, D. and Cooke, R. (2003). “A parameterization of positive deﬁnite matrices in terms of partial correlation vines.” Linear Algebra and its Applications , 372(Sup- plement C): 225–251. Lewandowski, D., Kurowicka, D., and Joe, H. (2009). “Generating random correla- tion matrices based on vines and extended onion method.” Journal of Multivariate Analysis , 100(9): 1989–2001. Owen, D. B. (1980). “A table of normal integrals.” Communications in Statistics - Simulation and Computation , 9(4): 389–419. URL http://dx.doi.org/10.1080/03610918008812164 Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning . The MIT Press. Strang, G. and Borre, K. (1997). Linear Algebra, Geodesy and GPS . Wellesley- Cambridge Press. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 8 Supplement: JSDMs with additive multivariate Gaussian processes Appendix A: Extra results Standard models LOO Cross Validation 5-fold Cross validation 1) Univariate GP h(x), univ. (s) -2.230 (0.082) -2.465 (0.094) diatom.algae -3.942 (0.225) -4.002 (0.225) ﬁlame.algae -4.568 (0.141) -4.692 (0.146) macro-veg -1.909 (0.234) -2.092 (0.240) threespine-sb -1.477 (0.157) -1.624 (0.154) ninespine-sb -1.334 (0.151) -1.464 (0.149) white-ﬁsh -2.840 (0.200) -3.092 (0.216) vendance -1.634 (0.210) -2.196 (0.314) 2) Univariate GP h(x), multiv. (s) -2.081 (0.080) -2.480 (0.100) diatom.algae -3.578 (0.232) -4.010 (0.217) ﬁlame.algae -4.104 (0.159) -4.680 (0.135) macro-veg -1.733 (0.223) -2.148 (0.252) threespine-sb -1.385 (0.149) -1.637 (0.156) ninespine-sb -1.232 (0.139) -1.489 (0.151) white-ﬁsh -2.782 (0.203) -3.092 (0.213) vendance -1.537 (0.204) -2.215 (0.316) 3) Multiv. GP h(x), multiv. (s) -1.669 (0.080) -2.316 (0.086) diatom.algae -2.307 (0.112) -3.947 (0.217) ﬁlame.algae -1.244 (0.751) -4.623 (0.140) macro-veg -1.289 (0.150) -1.950 (0.248) threespine-sb -1.210 (0.140) -1.509 (0.156) ninespine-sb -1.184 (0.147) -1.389 (0.149) white-ﬁsh -3.399 (0.393) -2.964 (0.205) vendance -2.041 (0.451) -1.837 (0.247) 4) Multiv. GP h(x) only -2.228(0.089) -2.590(0.012) diatom.algae -4.033(0.235) -4.545(0.379) ﬁlame.algae -4.697(0.156) -4.865(0.187) macro-veg -1.754(0.226) -2.514(0.379) threespine-sb -1.418(0.173) -1.494(0.172) ninespine-sb -1.355(0.183) -1.393(0.168) white-ﬁsh -2.866(0.212) -3.192(0.271) vendance -1.608(0.222) -2.458(0.400) 5) Multiv. (s) only -2.351(0.087) -2.500(0.086) diatom.algae -4.129(0.200) -4.231(0.182) ﬁlame.algae -4.607(0.119) -4.729(0.109) macro-veg -2.255(0.254) -2.800(0.210) threespine-sb -1.471(0.174) -1.567(0.168) ninespine-sb -1.375(0.178) -1.473(0.152) white-ﬁsh -3.147(0.181) -3.226(0.191) vendance -1.675(0.235) -1.864(0.260) Table 1: Model comparison with leave-one-out (LOO) and 5-fold cross validation using the mean point wise log marginal predictive density statistics (and its standard error) for models 1-5 (see Section 5.3 in the main text). The bolded numbers show the overall performance over all species and the indented text shows the species speciﬁc predictions. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 9 Standard models LOO Cross Validation 5-fold Cross validation 6) Univ. quadr. h(x), univ. (s) -2.262 (0.083) -2.517 (0.095) diatom.algae -3.941(0.228) -4.035(0.237) ﬁlame.algae -4.608(0.138) -4.741(0.150) macro-veg -1.955(0.239) -2.233(0.300) threespine-sb -1.490(0.158) -1.598(0.156) ninespine-sb -1.337(0.151) -1.501(0.140) white-ﬁsh -2.885(0.204) -3.137(0.215) vendance -1.705(0.215) -2.274(0.315) 7) Univ. quadr. h(x), multiv. (s) -2.117 (0.084) -2.484 (0.109) diatom.algae -3.568(0.243) -3.994(0.225) ﬁlame.algae -4.104(0.170) -4.717(0.137) macro-veg -1.763(0.229) -2.269(0.294) threespine-sb -1.360(0.165) -1.517(0.171) ninespine-sb -1.233(0.143) -1.484(0.142) white-ﬁsh -2.853(0.210) -3.098(0.251) vendance -1.668(0.226) -2.208(0.340) 8) Multiv. quadr. h(x), multiv. (s) -2.105 (0.084) -2.416 (0.099) diatom.algae -3.561(0.242) -4.009(0.242) ﬁlame.algae -4.090(0.177) -4.745(0.170) macro-veg -1.742(0.229) -2.197(0.280) threespine-sb -1.346(0.165) -1.476(0.173) ninespine-sb -1.217(0.149) -1.441(0.146) white-ﬁsh -2.845(0.201) -3.045(0.241) vendance -1.663(0.226) -2.080(0.311) 9) Multiv. quadr. h(x) only -10.246(1.297) -9.305(1.008) diatom.algae -48.063(10.943) -32.673(6.132) ﬁlame.algae -52.309(8.539) -42.283(6.498) macro-veg -7.139(1.439) -19.895(6.421) threespine-sb -1.441(0.178) -1.494(0.177) ninespine-sb -1.375(0.192) -1.400(0.158) white-ﬁsh -2.899(0.215) -3.209(0.275) vendance -1.677(0.227) -2.311(0.364) Table 2: Model comparison with leave-one-out (LOO) and 5-fold cross validation using the mean point wise log marginal predictive density statistics (and its standard error) for models 6-9 (see Section 5.3 in the main text). The bolded numbers show the overall performance over all species and the indented text shows the species speciﬁc predictions. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 10 Supplement: JSDMs with additive multivariate Gaussian processes Pairwise model comparison LOO Cross Validation 5-fold Cross validation Model 8 - Model 6 0.156(0.019) 0.101(0.030) diatom.algae 0.379(0.106) 0.025(0.020) ﬁlame.algae 0.537(0.097) -0.003(0.037) macro-veg 0.214(0.214) 0.135(0.065) threespine-sb 0.144(0.032) 0.012(0.023) ninespine-sb 0.120(0.026) 0.059(0.017) white-ﬁsh 0.036(0.039) 0.091(0.045) vendance 0.043(0.032) 0.194(0.031) Model 8 - Model 7 0.012(0.004) 0.052(0.010) diatom.algae 0.006(0.016) -0.015(0.028) ﬁlame.algae 0.014(0.021) -0.027(0.046) macro-veg 0.021(0.012) 0.072(0.028) threespine-sb 0.015(0.008) 0.041(0.015) ninespine-sb 0.016(0.011) 0.043(0.017) white-ﬁsh 0.008(0.011) 0.052(0.019) vendance 0.006(0.006) 0.128(0.033) Table 3: Pairwise comparison of leave-one-out (LOO) and 5-fold cross validation point wise log predictive densities of multivariate parametric (quadratic environmental re- sponses) model (model 8) and parametric models without any interspeciﬁc correlations (model 6) or with interspeciﬁc correlations only in spatial random eﬀect (model 7). We report the mean (and its standard error) of the diﬀerences in point wise log predic- tive densities at test locations. The bolded numbers show the overall performance and normal text shows the species speciﬁc predictions. Pairwise model comparison LOO Cross Validation 5-fold Cross validation Model 3 - model 1 0.561(0.126) 0.150(0.123) diatom.algae 1.637(0.151) 0.054(0.029) ﬁlame.algae 3.974(0.495) 0.070(0.027) macro-veg 0.620(0.129) 0.141(0.029) threespine-sb 0.267(0.097) 0.115(0.017) ninespine-sb 0.151(0.097) 0.075(0.019) white-ﬁsh -0.300(0.127) 0.129(0.026) vendance 0.051(0.073) 0.359(0.077) Model 3 - model 2 0.413(0.123) 0.164(0.017) diatom.algae 1.271(0.136) 0.065(0.029) ﬁlame.algae 3.496(0.504) 0.057(0.020) macro-veg 0.444(0.011) 0.198(0.031) threespine-sb 0.175(0.087) 0.128(0.198) ninespine-sb 0.049(0.085) 0.100(0.021) white-ﬁsh -0.357(0.011) 0.128(0.026) vendance -0.041(0.055) 0.377(0.078) Table 4: Pairwise comparison of leave-one-out (LOO) and 5-fold cross validation point wise log predictive densities of multivariate GP model (model 3) and GP models without any interspeciﬁc correlations (model 1) or with interspeciﬁc correlations only in spatial random eﬀect (model 2). We report the mean (and its standard error) of the diﬀerences in point wise log predictive densities at test locations. The bolded numbers show the overall performance and normal text shows the species speciﬁc predictions. imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 11 imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Table 5: Hyperparameters estimates from covariates covariance functions Diatom-algae Filam-algae Macro-veg Threespine-sb Ninespine-sb White-ﬁsh Vendance SDM | JSDM SDM | JSDM SDM | JSDM SDM | JSDM SDM | JSDM SDM | JSDM SDM | JSDM σ 1.92 | 2.19 1.96 | 1.87 1.63 | 1.62 1.70 | 1.64 2.18 | 1.78 1.57 | 1.30 2.61 | 2.47 Opennes ` 1.56 | 1.59 1.29 | 1.31 1.51 | 1.26 1.90 | 2.23 2.12 | 1.86 1.98 | 1.61 0.64 | 0.92 Distance to σ 2.24 | 1.84 1.55 | 1.48 2.87 | 2.64 2.54 | 2.23 1.98 | 1.43 1.66 | 1.32 1.79 | 1.37 deep ` 1.09 | 1.18 1.16 | 1.06 1.39 | 1.87 1.04 | 1.39 1.48 | 1.80 2.02 | 1.33 1.32 | 1.09 σ 1.89 | 1.72 1.51 | 1.63 2.14 | 1.85 2.11 | 1.49 1.89 | 1.79 3.37 | 2.26 2.59 | 2.37 Sandy bottom ` 1.49 | 1.43 1.48 | 1.24 1.30 | 0.84 0.71 | 0.46 1.93 | 1.60 1.15 | 0.95 0.91 | 1.06 index σ 2.36 | 2.37 2.16 | 1.74 3.26 | 2.54 1.70 | 1.63 1.99 | 1.86 1.64 | 1.09 6.69 | 7.52 Ice breakup ` 1.11 | 1.18 0.75 | 1.06 0.62 | 1.87 1.32 | 1.39 2.22 | 1.80 1.87 | 1.33 0.70 | 1.09 week σ 1.62 | 1.53 1.55 | 1.64 2.93 | 2.68 1.91 | 1.46 1.72 | 1.42 1.95 | 1.75 2.53 | 2.48 Chlorophyl-a ` 1.11 | 1.46 0.75 | 0.78 0.62 | 0.49 1.32 | 1.76 2.22 | 1.71 1.87 | 1.30 0.70 | 0.66 σ 1.68 | 1.42 1.56 | 1.30 1.69 | 2.26 1.96 | 1.71 1.90 | 2.34 1.67 | 2.67 1.65 | 1.55 Distance to ` 1.59 | 1.76 1.63 | 1.42 1.10 | 0.80 0.76 | 0.65 1.23 | 0.73 0.36 | 0.41 1.45 | 1.63 nearest river σ 0.81 | 1.01 0.69 | 1.07 0.69 | 0.81 1.77 | 0.87 1.08 | 1.00 2.01 | 1.00 0.75 | 1.00 Bottom type | | | | | | | σ | | | | | | 2.87 | 2.17 Salinity ` | | | | | | 1.24 | 1.37 σ 10.9 | 9.35 7.31 | 6.00 4.62 | 4.67 6.87 | 6.42 5.66 | 5.74 8.11 | 7.39 9.32 | 6.85 Spatial ` 0.47 | 0.45 0.74 | 1.06 0.61 | 0.44 0.44 | 0.43 0.78 | 0.95 0.15 | 0.22 2.14 | 2.45 12 Supplement: JSDMs with additive multivariate Gaussian processes Figure 1: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for diatomo algae predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 13 Figure 2: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for ﬁlamentous algae predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 14 Supplement: JSDMs with additive multivariate Gaussian processes Figure 3: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for macrovegetation predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 15 Figure 4: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for whiteﬁsh predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 16 Supplement: JSDMs with additive multivariate Gaussian processes Figure 5: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for ninespine-stickleback predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 17 Figure 6: Posterior predictive median of intensity and the expected count of individuals in replicate sampling for threespine-stickleback predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 18 Supplement: JSDMs with additive multivariate Gaussian processes Figure 7: Posterior predictive variance for the latent ﬁeld f (x, s)| y, η ˆ, λ diatomo algae predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 19 Figure 8: Posterior predictive variance for the latent ﬁeld f (x, s)| y, η ˆ, λ vendace pre- dicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 20 Supplement: JSDMs with additive multivariate Gaussian processes Figure 9: Posterior predictive variance for the latent ﬁeld f (x, s)| y, η ˆ, λ for ﬁlamentous algae predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 21 Figure 10: Posterior predictive variance for the latent ﬁeld f (x, s)| y, η ˆ, λ for macroveg- etation predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 22 Supplement: JSDMs with additive multivariate Gaussian processes Figure 11: Posterior predictive variance for the latent ﬁeld f (x, s)| y, η ˆ, λ for whiteﬁsh predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 Vanhatalo, Hartmann and Veneranta 23 Figure 12: Posterior predictive variance for the latent ﬁeld f (x, s)| y, η ˆ, λ ninespine- stickleback predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019 24 Supplement: JSDMs with additive multivariate Gaussian processes Figure 13: Posterior predictive variance for the latent ﬁeld f (x, s)| y, η ˆ, λ for threespine- stickleback predicted by SSDM (model 1) and JSDM (model 3). imsart-ba ver. 2014/10/16 file: supp-JSDMfinal-ArXiv.tex date: August 9, 2019

Statistics – arXiv (Cornell University)

**Published: ** Sep 7, 2018

Loading...

You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!

Read and print from thousands of top scholarly journals.

System error. Please try again!

Already have an account? Log in

Bookmark this article. You can see your Bookmarks on your DeepDyve Library.

To save an article, **log in** first, or **sign up** for a DeepDyve account if you don’t already have one.

Copy and paste the desired citation format or use the link below to download a file formatted for EndNote

Access the full text.

Sign up today, get DeepDyve free for 14 days.

All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.