Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

A combination of supervised dimensionality reduction and learning methods to forecast solar radiation

A combination of supervised dimensionality reduction and learning methods to forecast solar... Machine learning is routinely used to forecast solar radiation from inputs, which are forecasts of meteorological variables provided by numerical weather prediction (NWP) models, on a spatially distributed grid. However, the number of features resulting from these grids is usually large, especially if several vertical levels are included. Principal Components Analysis (PCA) is one of the simplest and most widely-used methods to extract features and reduce dimensionality in renewable energy forecasting, although this method has some limitations. First, it performs a global linear analysis, and second it is an unsupervised method. Locality Preserving Projection (LPP) overcomes the locality problem, and recently the Linear Optimal Low-Rank (LOL) method has extended Linear Discriminant Analysis (LDA) to be applicable when the number of features is larger than the number of samples. Supervised Nonnegative Matrix Factorization (SNMF) also achieves this goal extending the Nonnegative Matrix Factorization (NMF) framework to integrate the logistic regression loss function. In this article we try to overcome all these issues together by proposing a Supervised Local Maximum Variance Preserving (SLMVP) method, a supervised non-linear method for feature extraction and dimensionality reduction. PCA, LPP, LOL, SNMF and SLMVP have been compared on Global Horizontal Irradiance (GHI) and Direct Normal Irradiance (DNI) radiation data at two different Iberian locations: Seville and Lisbon. Results show that for both kinds of radiation (GHI and DNI) and the two locations, SLMVP produces smaller MAE errors than PCA, LPP, LOL, and SNMF, around 4.92% better for Seville and 3.12% for Lisbon. It has also been shown that, although SLMVP, PCA, and LPP benefit from using a non- linear regression method (Gradient Boosting in this work), this benefit is larger for PCA and LPP because SMLVP is able to perform non-linear transformations of inputs. Keywords Dimensionality reduction · Hybrid learning · Solar radiation forecast · Data mining 1 Introduction intermittent. Transient clouds and aerosol intermittency lead to considerable variability in the solar power plants Considerable efforts have been made in the past decades yield on a wide range of temporal scales, particularly to make solar energy a real alternative to the conventional in minutes to hours time scales. This presents serious energy generation system. There are two main technologies, issues regarding solar power plant management and their solar thermal electricity (STE) and solar photovoltaic (PV) yield integration into the electricity grid [1]. Currently, in energy, and many countries have already reached a notable addition to expensive storage-based solutions, the use of solar share in their energy mixes. Moreover, important solar radiation forecasts is the only plausible way to mitigate growth is expected in the near future (International Energy the intermittency. Therefore, the development of accurate Agency, 2018). solar radiation forecasting methods has become an essential Contrary to conventional generation, solar electricity research topic [2]. generation is conditioned by weather, and thus it is highly Solar forecasting methods can be classified depending on the forecasting horizon. Nowcasting methods are mostly related to one-hour ahead forecasts, short-term forecasting Esteban Garc´ ıa-Cuesta with up to 6 hours ahead forecasts and forecasting methods esteban.garcia@fi.upm.es are aimed at producing days ahead forecasts. The techniques associated with these methods are essentially different [3– Extended author information available on the last page of the article. 5]. In recent years, these has been increasing interest, E. Garc´ıa-Cuesta et al. particularly, in short-term forecasting, fostered by the to extract features from a NWP grid to improve renewable expected massive deployment of solar PV energy. Accurate energy forecasting. Advanced machine learning methods, short-term solar forecasts are important to ensure the quality such as convolutional neural networks, have also been used of the PV power delivered to the electricity network as a feature extraction scheme for wind power prediction and, thus, to reduce the ancillary costs [6, 7]. Short- using NWPs, showing competitive results compared to term forecasting has also been successfully used for the a PCA baseline [27]. Garcıa-Hinde et al. [28] presents management of STE plants [8, 9] and for the participation a study on feature selection and extraction methods for of PV and STE plants in the energy market [8, 10]. solar radiation forecasting. The study includes classical Short-term forecasts can be derived either from satellite methods, such as PCA or variance and correlation filters, imagery [11, 12] or from Numerical Weather Prediction and novel methods based on the adaptation of the support (NWP) models [13–15]. As solar radiation measured vector machines and deep Boltzmann machines for the task datasets have become progressively available, the use of of feature selection. Results show that one of the novel data-driven methods have become increasingly popular methods (the adaptation of support vector machine) and [16]. In [15, 17, 18] a comparison of the performance of PCA select high relevance features. Verbois et al. [29] different methods is assessed. combine feature extraction (PCA) and stepwise feature The use of NWP models for short-term solar forecasting selection of NWP variables for solar irradiance forecasting, has some important advantages, such as the global and comparing favorably with other benchmark methods. In easy availability of the forecasts. Because of that, this [30] a hybrid approach that combines PCA and deep approach was extensively evaluated during the past decade learning is presented to forecast wind power from hours [14, 15, 19]. Nevertheless, the reliability is far from optimal to years, showing a good performance. A recent study on and machine-learning methods play an important role in solar irradiance forecasting has compared many methods on providing enhanced solar forecasts derived from NWPs different datasets, where PCA has been used as the main models [20, 21]. In this context, the inputs for machine method for feature extraction and dimensionality reduction learning techniques are forecasts of several meteorological [31]. In general, it is observed that PCA, even in recent variables provided by numerical weather prediction (NWP) works, is one of the most widely-used methods to extract physical models such as the European Center for Medium features in renewable energy forecasting. Weather Forecasts (ECMWF) and the Global Ensemble PCA is a multivariate statistical analysis that transforms Forecast System (GEFS). Meteorological variables are a number of correlated variables into a smaller group of forecast for the points of a grid over the area of interest. uncorrelated variables called principal components [32]. However, the number of features resulting from these grids PCA has two main limitations. First, it performs a global is usually large, especially if several vertical levels are linear analysis by an axis transformation that best represents included in the grid. This may result in models that do not the mean and variance of the given data, but lacks the ability generalize well, and techniques to reduce the dimensionality to give local information representation. Second, PCA is an of data are required. unsupervised method, that is, the target output is not used Dimensionality reduction techniques can be divided into to extract the new features and this may be a drawback to feature selection and feature extraction. Feature selection finding the best low dimensional representation whenever methods select the most relevant variables in the grid, while labels are available. feature extraction summarizes information from the whole In this article we propose Supervised Local Maximum grid into fewer features. Both approaches have been used in Variance Preserving (SLMVP), a kernel method for the context of renewable energy forecasting with machine supervised feature extraction and dimensionality reduction. learning [22, 23]. Feature selection techniques have been The method considers both characteristics: it preserves used in [24] where methods such as Linear Correlation, the maximum local variance and distribution of the data, ReliefF, and Local Information Analysis have been explored but also considers the distribution of the data by the to study the influence on forecast accuracy of the number response variable to find an embedding that best represents of NWP grid nodes used as input for the solar forecasting the given data structure. This method can be applied to model. multiclass and regression problems when the sample size m In [25], feature extraction (PCA) is compared with is small and the dimensionality p is relatively large or very feature selection (a minimal redundancy and maximal- large as opposed to Fisher’s Linear Discriminant Analysis relevance method) to reduce the dimensionality of variables (LDA) [33], one of the foundational and most important in a grid for wind power forecasting in the east of China. approaches to classification. In summary, SLMVP uses the The authors conclude that PCA is a good choice to simplify full or partially labeled dataset to extract new features that the feature set, while obtaining competitive results. PCA maximize the variance of the embedding that best represents has also been used in [26] together with domain knowledge the common local distances [34] and computationally is Supervised dimensionality reduction to forecast solar radiation based on weighted graphs [35]. Additionally, this method described in Section 3 and the experimental design included is able to perform a linear and non-linear transformation of in Section 4. The Conclusions section summarizes the main the original space by using different kernels as the similarity results. metric. To validate the SLMVP method, it has been tested to extract features in order to improve solar radiation fore- 2 Supervised dimensionality reduction casting (both Global Horizon Irradiance (GHI) and Direct method: Kernel-SLMVP Normal Irradiance (DNI)) for a 3-hour forecasting horizon, and compared to PCA (the most popular workhorse in the As has been mentioned in Section 1, PCA is an unsuper- area), but also to other state-of-the-art methods that have vised method that performs a global analysis of the whole not been previously used in the context of solar radiation dataset. As opposed to the global-based data projection forecasting. These methods are (1) Locality Preserving Pro- techniques like PCA, other methods based on local struc- jection (LPP, an unsupervised local dimensionality reduc- ture preservation i.e. ISOMAP [40], LLP [36], Laplacian tion method) that finds linear projective maps that arise Eigenmaps [41], and Locally Linear Embedding [42]have by solving a variational problem that optimally preserves been proposed in order to overcome the characteristic of the neighborhood structure of the dataset [36]; (2) Lin- being global. Although these techniques use linear opti- ear Optimal Low-Rank (LOL, a supervised dimensionality mization solutions, they are also able to represent nonlinear reduction method) that learns a lower-dimensional repre- geometric features by local linear modeling representation sentation in a high-dimensional low sample size setting that lies in a low dimensional manifold [43]. Note that extending PCA by incorporating class-conditional moment these non-linear methods still do not consider labeled data, estimates into the low-dimensional projection [37], and (3) that is, they are unsupervised methods. Recently, Linear Supervised Non-negative Matrix Factorization (SNMF) that Optimal Low-Rank (LOL) projection has been proposed extends Negative Matrix Factorization (NMF) to be super- incorporating class-conditional means. The key intuition vised [38, 39]. SNMF integrates the logistic regression loss behind LOL is that it can jointly use the means and vari- function into the NMF framework and solves it with an ances from each class (like LDA), but without requiring alternating optimization procedure. All of these methods are more dimensions than samples [37]. Another recent method able to solve the “large p,small m” problem as opposed is Supervised Non-negative Matrix Factorization (SNMF) to many classical statistical approaches that were designed that extends Negative Matrix Factorization (NMF) to be with a “small p,large m” situation in mind (e.g. LDA). supervised [38]. SNMF integrates the logistic regression Features have been extracted from meteorological forecasts loss function into the NMF framework and solves it with (obtained from the GEFS) in points of a grid around two an alternating optimization procedure. For both methods, locations in the Iberian peninsula: Seville and Lisbon. Two regression can be done by projecting the data onto a lower- grid sizes have been tested, small and large. The perfor- dimensional subspace followed by the application of linear mance of SLMVP has been compared with PCA, LPP, LOL, or non-linear regression techniques. This mitigates the curse and SNMF using two different regressors, a linear one (stan- of high-dimensions. dard Linear Regression (LR)) and a non-linear technique The Supervised Local Maximum Variance Preserving (Gradient Boosting). Thus the main contributions of this (SLMVP) dimensionality reduction method solves the work are: problem of LPP to work on problems with “large p small m” and the global approach of LOL and SNMF, despite being A new local and supervised dimensionality reduction supervised. Therefore, SLMVP preserves the maximum method capable of solving the “large p,small m” problem. local variance of the data being able to represent non-linear The application of SLMVP to reduce the dimensionality properties, but also considers the output information (in a of the NWP variables in a grid for the solar radiation supervised mode) to preserve the local patterns between forecasting problem. inputs and outputs. In summary, it uses the full or partially The comparison with PCA, one of the most widely- labeled dataset to extract new features that best represent the used methods in the context of renewable energy for local maximum joint variance. feature extraction, LPP, and two state-of-the-art recent SLMVP is based on a graph representation for a given supervised methods, LOL and SNMF, showing the set of inputs x , x ,..., x ∈ , and a set of outputs 1 2 m usefulness of the proposed method. y , y ,..., y ∈ . With m being the sample data points 1 2 m The structure of the article is as follows. Section 2 and, p and l the number of input and output features, explains the SLMVP method, which is tested using the data in our case the dimensionality of l = 1and p = 342 E. Garc´ıa-Cuesta et al. for the small grid, and m = 12274 for the large grid. polynomial K(a, b) = (1 + a · b) or Gaussian K(a, b) = −|a−b| The application of any similarity function S on the inputs 2σ e ). Finally, the (3) can be solved as an eigenvector m×p m×l S (X) : X ∈ and S (Y) : Y ∈ defines x y problem on B as follows: an input weighted graph {H, U } and an output weighted graph {I, V } with H and I being the nodes, and U and V T XK K X B = λB (4) x y the vertex, respectively. The graphs are not constrained and can be fully connected, or some weights can have a zero where B is the learned latent space. The projection of the value meaning that the connection between those points input space data X on this space P = B X are the new disappears. The weight of the links represents the similarity extracted features to be used by the machine learning model. between two data points. These characteristics allow the The Python code of SLMVP has been released publicly method with the capability of being local. Following [41] at [44]. and [35] a graph embedding viewpoint can be used to reduce the dimensionality, mapping a weighted connected graph G = (V , E) to a line so that the connected points stay 3 Data description as close together as possible. The unsupervised dimensionality reduction problem The dataset used in this study concerns GHI and DNI T  k aims to choose the mapping y = A x : y ∈ and measurements at two radiometric solar stations in the i i k  p, which minimizes the distance with its neighbors in Iberian Peninsula: Seville and Lisbon. GHI and DNI have multidimensional data and can be expressed by the next cost been acquired with a Kipp & Zonen CMP6 pyranometer, function: with a 15-minute resolution. The set of inputs is a collection of forecasted meteorolog- ical variables obtained from GEFS at different levels of the J = y − y  w (1) ns ij i j atmosphere and at different latitudes and longitudes. More ij specifically, 9 meteorological variables at different levels are used (see Table 1), making a total of 38 attributes at each mxm where W ∈ is the similarity matrix S (X). latitude-longitude pair. Latitudes go from 32 to 51 and lon- Following this graph embedding approach, SLMVP gitudes go from -18 to 6 with a resolution of 0.5 degrees. solves the supervised version and the wish to choose the In this work, two grids of different sizes have been used: a T  k mapping y = A x : y ∈ and k  p, which minimizes i i small grid with 3 × 3 = 9 points around the solar station the distance with its neighbors in multidimensional data but (Seville and Lisbon) and a larger one with 17 × 19 = 323 preserves only those distances that are shared in the input points. For Seville, the larger grid covers the Iberian Penin- and output spaces, given the similarity functions for each of sula (latitudes: 36 to 44, longitudes: 350 to 359.5, both with them S and S . The cost function is then expressed by: x y a resolution of 0.5 degrees). In the case of the Lisbon solar station, the larger grid has been shifted to cover part of the Atlantic Ocean (latitudes also go from 36 to 44 and longi- J = y − y  z (2) s ij i j tudes go from 346 to 355.5). Figure 1 shows both the wide ij and narrow grids, centered around Seville and Lisbon (in m×m blue). Since each point in the grid contains 38 attributes, the where Z ∈ represents the joint similarity matrix small grid results in 3 × 3 × 38 = 342 input variables, and between input S (X) and output S (Y) similarity matrices, x y the larger one in 17 × 19 × 38 = 12274 inputs. being z = u v . Note the difference between (1) ij ik kj k=1 GEFS provides predictions of meteorological variables that is non supervised and (2) that defines a supervised for a 3-hour forecasting horizon every 6 hours each manifold learning problem using the similarity matrix day (00:00am, 06:00am, 12:00pm, and 18:00pm). The between inputs and outputs. corresponding GHI and DNI measurements are also used. The minimization of the cost function (2) can be To select the relevant hours of the day for GHI and DNI, expressed in its kernelization form (Kernel-SLMVP) samples with a zenithal angle larger than 75 degrees have after some transformations as the following maximization been removed. Given this restriction, data times range from problem: 9:15am to 6:00pm. The total input–output data covers from March 2015 to March 2017. max tr(Y K K Y ) (3) x y In this study, GHI and DNI are normalized by the irradiance of clear sky according to (5). where K = S (X) and K = S (Y) are the input and x x y y output similarity graphs expressed as kernel functions (i.e. I (t ) = I(t)/I (t ) (5) kt cs Supervised dimensionality reduction to forecast solar radiation Table 1 Meteorological Variable Description Levels Variables CLWMR Cloud mixing ratio 300, 350, 400, 450, 500, 550 mb 600, 650, 700, 750, 800, 850 mb 900, 925, 950, 975, 1000 mb HGT Geopotential Height 500, 850, 925, 1000 RH Relative humidity 500, 850, 925 UGRD U component of wind 500, 850, 925 mg VGRD V component of wind 500, 850, 925 SOILW Soil Temperature 0.0-0.1 m TMP 2-meter temperature 2 m CAPE Convective available potential energy surface, 255 PRMSL Pressure reduced to MSL surface where I(t) stands for GHI or DNI at time t and I (t ) is the In this Section, first the methodology employed is cs irradiance of clear sky at a particular at time t. described. Then, the results comparing SLMVP with PCA, LPP, LOL, and SNMF for different GEFS grid sizes will be presented. 4 Experimental validation 4.1 Methodology In order to study the performance of the SLMVP algorithm, it has to be combined with a regression method to predict Cross-validation (CV) has been applied to study the normalized GH I and DN I for a 3-hour forecasting horizon performance of SLMVP, PCA, LPP, LOL, and SNMF. and compared with the other above mentioned methods In standard CV, instances are distributed randomly into PCA, LPP, LOL, and SNMF. The regression technique CV partitions. But our study involves time series of data, uses as inputs the attributes/features from the input-space and therefore there are temporal dependencies between transformation obtained by the SLMVP, PCA, LPP, LOL, consecutive samples (in other words, consecutive samples and SNMF methods. As suggested in [37], to learn the can be highly correlated). Hence, in this study, group 4-fold projection matrix for the LOL method, we partition the data CV has been used, as explained next. Data has been split into K partitions (we select K = 10) equally separated into 4 groups, one for each week of every month. Fold 1 thus between the target variable range [0 − 1] to obtain a contains the first week of each month (January, February, K-class classification problem. In this work, linear and non- ...). Fold 2, the second week of every month and so on. linear regression methods have been tested. As a non-linear This guarantees that, at least training and testing partitions method, a state-of-the-art machine learning technique has will never contain instances belonging to the same week, been used: Gradient Boosting Regression (GBR) [45, 46]. which allows a more realistic analysis of the performance This technique has shown considerable success in predictive of the methods. Since in this work the optimal number of accuracy in recent years (see for instance [24, 47–49]). features must be selected, a validation set strategy has been Fig. 1 17 × 19 (black) and 44°N 44°N 3 × 3 (red) grids 42°N 42°N 40°N 40°N 38°N 38°N 36°N 36°N 34°N 34°N 32°N 32°N 15°W10°W5°W 0° 5°E 15°W10°W5°W 0° 5°E Longitude Longitude (a) Seville. (b) Lisbon. Latitude Latitude E. Garc´ıa-Cuesta et al. used. For this purpose, each training partition (that contains Given that 4-fold CV is used for performance evaluation, in 3 folds) is again divided into training and validation sets. each of the 4 CV iterations there is a training, validation, The validation set contains a week of each month out of the and testing partition. For each iteration, the regression three weeks of data available in the training partition. The models are trained with the training partition and then, the remaining two weeks (the ones not used for validation) are validation and test errors are obtained. The averages of the 4 used for training. iterations are obtained for the three errors (train, validation Mean Absolute Error (MAE) has been used as the and test). The validation error is used to select the optimal performance measure (6). Given that a 4-fold CV has been number of features. employed, results are the CV-average of MAE. SLMVP, SNMF, and GBR have some hyper-parameters that require tuning in order to improve results. Five hyper- i=N |y − o | i i i=1 2 MAE = (6) parameters were fitted: gamma parameter γ = that 2σ defines the Gaussian kernel function of the SLMVP method, where N is the number of samples and y and o are the i i α, β,and θ that defines the weight of each term of the SNMF actual value and the output of the model, respectively. Note method, and number of estimators and tree depth (which that the number of samples for training are 480, which is belongs to GBR). The following range of values for each smaller than the number of dimensions for the large grid hyper-parameter were tested: 480 << 12274 and within the same scale factor for the γ (SLMVP): from 0 to 2 in steps of 0.1 small grid 342 <≈ 480. α, β , θ (SNMF): from 0, 0.1, 0.01, 0.001 The performance of the methods are evaluated as follows. Number of estimators (GBR): from 10 to 200 in steps Recall that the number of the selected projected features of 10 is very relevant and the obtained features for the different Tree depth (GBR): from 1 to 10 in steps of 1 methods are also ordered by their importance. Then, in In order to tune the hyper-parameters, a systematic order to analyze the optimal number of dimensions, the performance of both linear and GBR regression methods is procedure known as grid-search was used. This method tries all possible combinations of hyper-parameter values. evaluated for 5, 10, 20, 50, 100 and 150 projected features. Table 2 Average test MAE Small-grid Large-grid and number of selected components for different Method MAE Components MAE Components methods and for GHI at Seville and Lisbon locations Seville SLMVP - LR 0.1673 20 0.1845 50 PCA - LR 0.8417 20 0.7929 20 LPP - LR 0.3655 3 >10 20 LOL - LR 0.1660 50 0.1949 50 SNMF - LR 0.1699 100 0.1890 50 SLMVP - GBR 0.1562 20 0.1653 50 PCA - GBR 0.1688 20 0.1808 50 LPP - GBR 0.2008 150 0.2605 20 LOL - GBR 0.1875 150 0.1813 100 SNMF - GBR 0.1653 50 0.1885 50 Lisbon SLMVP - LR 0.2035 50 0.2217 50 PCA - LR 0.7734 100 0.7706 10 LPP - LR >5 100 >10 10 LOL - LR 0.2029 50 0.2233 100 SNMF - LR 0.2055 50 0.2209 50 SLMVP - GBR 0.1974 20 0.2084 100 PCA - GBR 0.2008 10 0.2167 50 LPP - GBR 0.2269 150 0.2548 5 LOL -GBR 0.2272 100 0.2254 150 SNMF - GBR 0.2023 20 0.2278 100 The bold entries are the best model for each case (Seville GHI and Lisbon GHI) independently of the grid size (small or large) Supervised dimensionality reduction to forecast solar radiation Models for each hyper-parameter combination are trained SLMVP, although GBR obtains better errors than LR, the with the training partition and evaluated with the validation difference between linear and non-linear is smaller than for partition. The best combination on the validation set is PCA and LPP cases. For instance, observing the GHI results selected. for Seville (top of Table 2), it can be seen that for the small grid (top left), the difference between GBR and LR (when 4.2 Results using SLMVP) is only 0.1562 vs. 0.1673, and for the large- grid (top right), is 0.1653 vs. 0.1845. Similar differences Table 2 shows the average GHI MAE for the best number can be observed for GHI at Lisbon (bottom of Table 2). of components for different methods and grid sizes. Table 3 This is reasonable because SLMVP uses a non-linear kernel, displays the same information for DNI. The best number so even when using LR, some of the non-linearity of the of components has been selected using the MAE for the problem has been included by SLMVP feature extraction validation set. In all cases, it is observed that the use process. Conclusions for DNI (Table 3) follow a similar of the nonlinear regression technique (GBR) improves trend: PCA and LPP benefit more from using a non-linear considerably the errors for PCA and LPP, in some cases method (GBR) than SLMVP and SNMF, but LOL benefits for LOL and SNMF, and always minor improvements for more using a regularized linear regressor. LOL includes SLMVP. For instance, in the case of the small-grid for GHI linear class prior information about which is beneficial for in Seville (Table 2 top left), the use of GBR with PCA LR. improves the MAE considerably (from 0.6467 with LR to Analyzing the results depending on the size of grid (small 0.0126 with GBR accountable for a 6.8% improvement). vs. large), it is observed that the use of a large grid does Similar improvements for PCA, LPP, and LOL MAE can not result in better MAE values. The best errors are always be observed for the large-grid, from 0.6084 to 0.0155 obtained with the small grid in all cases of Tables 2 (GHI) accountable for a 8.51% improvement (Table 2 top right). and 3 (DNI). When the large grid is used, more components Lisbon GHI (Table 2 bottom) behaves in a similar way. For are used for SLMVP but, as already mentioned, this does SNMF the differences are almost nonexistent. In the case of not improve the results. Table 3 Average test MAE Small-grid Large-grid and number of selected components for different Method MAE Components MAE Components methods and for DNI at Seville and Lisbon locations Seville SLMVP - LR 0.2580 50 0.2787 50 PCA - LR 0.6628 5 0.6531 20 LPP - LR 0.9360 10 >10 10 LOL - LR 0.2534 50 0.2785 50 SNMF - LR 0.2598 100 0.2888 50 SLMVP - GBR 0.2446 20 0.2600 50 PCA - GBR 0.2536 20 0.2788 10 LPP - GBR 0.3017 150 0.3586 5 LOL -GBR 0.2900 100 0.2640 150 SNMF - GBR 0.2704 50 0.3021 20 Lisbon SLMVP - LR 0.2845 50 0.3076 100 PCA - LR 0.6048 100 0.6034 10 LPP - LR 3.3840 100 >10 10 LOL - LR 0.2873 20 0.3020 50 SNMF - LR 0.2896 100 0.3090 50 SLMVP - GBR 0.2732 50 0.2874 100 PCA - GBR 0.2855 10 0.3082 50 LPP - GBR 0.3214 100 0.3884 150 LOL -GBR 0.3278 100 0.3140 100 SNMF - GBR 0.2809 20 0.3228 150 The bold entries are the best model for each case (Seville DNI and Lisbon DNI) independently of the grid size (small or large) E. Garc´ıa-Cuesta et al. Table 4 Percentage GHI DNI improvement of SLMVP relative to PCA, LPP, LOL, and Method Small-grid Large-grid Small-grid Large-grid Avg. SNMF Seville PCA 8.07 % 9.34 % 3.68 % 7.20 % 6.68 % LPP 28.50 % 57.57 % 23.36 % 37.90 % 34.72 % LOL 6.24 % 7.14 % 3.60 % 1.52 % 3.95 % SNMF 5.82 % 13.98% 6.23% 11.07% 8.65% Lisbon PCA 1.73 % 3.98 % 4.50 % 7.23% 3.87 % LPP 14.96 % 22.27 % 17.62 % 35.14 % 21.31 % LOL 2.80 % 7.18 % 5.17 % 5.09 % 4.80 % SNMF 2.50 % 6.03% 2.81% 7.53% 4.23% Summarizing the results so far, for both irradiances, In order to visualize the relation between the number of GHI and DNI, and both locations (Seville and Lisbon), components and error, Fig. 2 shows the GHI validation and the best performance is always obtained with the SLMVP test MAE for the different number of components. This is method and the non-linear regression method (GBR). In done for SLMVP, PCA, LPP, LOL and SNMF using GBR as order to quantify this improvement better, Table 4 shows regressor, for Seville and Lisbon (top/botton, respectively), the percentage improvement of SLMVP relative to PCA, and for small and large grids (left/right, respectively). LPP, LOL, and SNMF for the best models (SLMVP+GBR, The same information is displayed in Fig. 3 for DNI. PCA+GBR, LPP+GBR, LOL+LR, and SNMF-GBR/LR). It is observed that the best number of PCA components In summary, it can be said that SLMVP offers results 4.92% is usually smaller than for other methods and that LPP better than LOL for Seville and around 3.99% than LOL and LOL usually benefit slightly with larger number of for Lisbon, 5.88% better than PCA for Seville and around components. 3.12% than PCA for Lisbon, 25.93% better than LPP for SNMF and SLMVP have similar behavior with the Seville and around 16.29% than LPP for Lisbon, 6.21% number of components with the optimal number being better than SNMF for Seville and around 2.82% for Lisbon. slightly smaller for SLMVP. In contrast to PCA and LPP, Fig. 2 Average MAE for GHI of SLMVP-GBR, PCA-GBR, LPP-GBR, LOL-GBR, and SNMF-GBR along the number of components (x-axis) for the small and large grids in Seville and Lisbon Supervised dimensionality reduction to forecast solar radiation Fig. 3 Average MAE for DNI of SLMVP-GBR, PCA-GBR, LPP-GBR, LOL-GBR, and SNMF-GBR along the number of components (x-axis) for the small and large grids in Seville and Lisbon the information and components found by SLMVP benefits a finite independent sample. But at least it can be seen the performance of the regression method. In Figs. 2 and 3 it that in all cases, using the validation error to determine the is observed that up to 20 components, the errors decrease in best number of components is a reliable way of achieving a all study cases. With a small grid, 20 components is the best reasonable test error. solution for all datasets except Lisbon DNI, which reached Figures 2 and 3 also show that the performance of the best solution with 50 components (see left part in Fig. 2 SLMVP is always better than PCA, LPP, LOL, and SNMF for GHI and for DNI left part of Fig. 3). When a large grid for every number of components (but for a few PCA is used, more than 20 components are generally beneficial, exceptions and one for SNMF ). In order to quantify these with 50 or 100 components being selected as the best improvements, Tables 5, 6, 7 and 8 show the percentage of options (50 components for Seville and 100 components for improvements of SLMVP over PCA, LPP, LOL and SNMF Lisbon, although 50 and 100 components perform similarly using the best results regression model for the different for both locations). In those figures, it is also observed that number of components used, respectively. The superiority although validation and test errors follow a similar trend, of SLMVP is clearly observed, but it is interesting to note it is not always the case that the best error in validation that when 5 components are used (and some cases with 10 corresponds to the best error in test. This should be expected components), either PCA is better or the improvement of because validation error is only an estimation obtained with SLMVP is smaller. This suggests that PCA is able to find Table 5 Improvements in percentage (%) of SLMVP over PCA for the different number of components Small-grid Large-grid Location 5 10 20 50 100 150 5 10 20 50 100 150 GHI Seville -0.47 4.37 6.80 17.49 9.41 4.43 0.58 -0.37 5.87 8.51 2.93 4.65 Lisbon 1.76 4.05 3.83 3.36 3.15 4.29 -1.99 0.76 1.11 4.89 8.11 6.68 DNI Seville -0.79 2.02 2.98 5.14 8.81 4.93 0.99 -2.14 0.63 5.46 6.02 5.39 Lisbon -0.84 1.32 1.08 6.64 8.67 8.10 1.35 0.34 1.28 5.14 10.70 9.89 E. Garc´ıa-Cuesta et al. Table 6 Improvements in percentage (%) of SLMVP over LPP for the different number of components Small-grid Large-grid Location 5 10 20 50 100 150 5 10 20 50 100 150 GHI Seville 9.66 23.53 28.06 22.46 22.34 14.28 26.14 28.11 50.14 57.16 50.47 49.52 Lisbon 20.91 27.89 20.13 16.80 13.87 12.51 11.75 15.65 19.28 28.31 22.69 20.68 DNI Seville 10.28 22.83 28.82 20.51 18.01 13.77 22.01 24.79 34.08 38.95 38.65 39.01 Lisbon 16.98 24.39 17.96 14.05 12.95 8.86 20.45 21.85 23.48 31.28 33.48 29.59 relevant information when only very few components are two cases have a p-value close to the 5% threshold being allowed. In any case, it is clear from Figs. 2 and 3 that more the p-value=0.08 and 0.1) and 8 out of 16 for Seville. These than 5 components are required in order to obtain the best insights suggest that the source data for both locations have results. different properties and Lisbon may contain more noisy Finally, to also verify that the SLMVP technique is data and therefore our method obtains larger improvements superior to the current use of PCA, LPP, LOL, and because of its noise tolerant characteristics introduced by SNMF not only for the optimal number of dimensions but the use of locality. independently of the number of dimensions selected, we have used a two-sample t-test for equal means to test the hypothesis that the obtained average error improvement 5 Conclusions for the different number of dimensions for each dataset is not due to chance. The obtained significance is shown in Using Machine Learning methods to forecast GHI or Table 9. We applied this test under the null hypothesis that DNI radiation, based on features that use NWP grids, the means are equal and the observations have different typically results in a large number of attributes. In this standard deviations. We used as observations the 6 test article, a supervised method for feature transformation error data results (5, 10, 20, 50, 100, and 150 extracted and reduction (SLMVP) has been proposed to extract components) obtained for each dataset . We conclude that the most relevant features solving the limitations of the improvement obtained for the analysis SLMVP vs. LPP PCA technique to represent locality, non-linear patterns, is significant, rejecting the hypothesis of equal means with and use labeled data. The PCA method is one of the a p-value always below < 0.001. Vs. PCA this p-value is most widely used methods to extract features and reduce below < 0.05 in 4 out of 8 cases (3 for Lisbon and 1 for dimensionality in renewable energy. Three other state-of- Seville), vs. LOL the p-value is below < 0.05 also in 4 out the-art dimensionality methods that include locality (LPP), of 8 cases (3 for Lisbon and 1 for Seville), and vs. SNMF the and supervision (LOL and SNMF) have been also compared p-value is below < 0.05 also in 4 out of 8 cases (2 for Lisbon with. and 2 for Seville) rejecting the hypothesis of equal means. The five methods have been tested and compared on In summary, we observed that for Lisbon, the null radiation data at two different Iberian locations: Seville hypothesis is rejected for 12 out of 16 cases (and the other and Lisbon. Both linear and non-linear (GBR) regression Table 7 Improvements in percentage (%) of SLMVP over LOL for the different number of components Small-grid Large-grid Location 5 10 20 50 100 150 5 10 20 50 100 150 GHI Seville 4.32 20.01 19.03 14.00 14.70 7.12 3.98 10.01 6.50 8.86 4.25 1.52 Lisbon 20.04 24.92 22.91 15.91 13.89 13.67 4.45 4.98 4.77 9.83 7.55 2.74 DNI Seville 6.76 17.72 20.71 18.85 14.59 10.21 3.28 6.67 5.95 8.21 5.43 1.18 Lisbon 13.62 16.69 20.80 21.68 15.12 17.03 12.42 11.74 9.30 9.19 8.30 6.70 Supervised dimensionality reduction to forecast solar radiation Table 8 Improvements in percentage (%) of SLMVP over SNMF for the different number of components Small-grid Large-grid Location 5 10 20 50 100 150 5 10 20 50 100 150 GHI Seville 2.23 8.44 6.35 1.22 4.13 3.23 7.72 11.30 6.10 9.86 10.74 8.58 Lisbon 7.86 5.36 2.40 -0.10 1.91 0.25 3.37 4.60 2.56 8.53 8.53 1.46 DNI Seville 0.79 2.02 2.98 5.14 8.81 4.93 5.24 7.50 10.03 14.87 11.99 9.44 Lisbon 0.84 1.32 1.08 6.64 8.67 8.10 60.2 5.07 8.20 12.91 15.67 9.12 methods have been used on the components extracted from DNI, 1.73% at Lisbon GHI, and around 4.50% at Lisbon SLMVP, PCA, LPP, LOL, and SNMF. DNI. Results show that for both types of radiation (GHI and Finally, although both SLMVP, PCA, and LPP benefit DNI) and both locations, SLMVP offers smaller MAE from using a non-linear regression method (GBR), this errors than the other methods. In order to assess the benefit is larger for PCA and LPP because they are not able influence of the size of the NWP grid, two sizes have to perform non-linear transformations. LOL does not benefit been tested, small and large. SLMVP results in better from non-linear regression and for some cases obtained radiation estimates, but the small size grids display slightly better results using the regularized linear regressor. SNMF better errors. It has also been shown that PCA tends to benefits slightly from non-linear regression for all but one underestimate the number of features required to obtain of the small grids, but not for large ones. Because SMLVP the best results. LPP obtains the worst results and this is able to use non-linear transformations, the difference is noticeable for large grids. SNMF has also shown a between using the linear and non-linear regression method degradation in its performance for the large grid compared is smaller as expected (but still present). with SLMVP. In summary, it can be said that the small grid We can conclude that SLMVP is a competitive method works better and the improvement of SLMVP over the other for dimensionality reduction in the context of solar radiation methods is about 6.24% at Seville GHI, 3.60% at Seville forecast using NWP variables beating PCA, which is Table 9 Dimensionality GHI DNI two-sample t-test analysis for equal means and 5, 10, 20, 50, Small-grid Large-grid Small-grid Large-grid and 150 dimensions Seville PCA t-value 1.74 2.59 1.07 1.29 p-value 0.11 <0.03 0.31 0.23 LPP t-value 8.19 10.20 5.38 14.36 p-value <0.001 <0.001 <0.001 <0.001 LOL t-value 2.25 2.56 1.55 1.7 p-value 0.07 <0.003 0.18 0.12 SNMF t-value 1.01 5.8 1.63 5.08 p-value 0.34 <0.001 0.13 <0.001 Lisbon PCA t-value 3.04 1.8 2.83 2.34 p-value <0.02 0.1 < 0.02 <0.05 LPP t-value 8.52 10.65 5.17 13.92 p-value <0.001 <0.001 <0.001 <0.001 LOL t-value 2.55 3.30 10.99 2.06 p-value <0.03 <0.02 <0.001 0.08 SNMF t-value 1.57 2.8 0.88 4.74 p-value 0.15 <0.02 0.4 <0.001 E. Garc´ıa-Cuesta et al. currently the most widely used, and LOL and SNMF which Appl Energy 228(C):265–278. https://doi.org/10.1016/j.apen ergy.2018.0 are two recent supervised dimensionality reduction state-of- 8. Dersch J, Schroedter-Homscheidt M, Gairaa K, Hanrieder N, the-art methods. Overall SLMVP also obtains better results Landelius T, Lindskog M, Muller ¨ SC, Ramirez Santigosa L, Sirch independently of the number of dimensions used, showing T, Wilbert S (2019) Impact of dni nowcasting on annual revenues its robustness. of csp plants for a time of delivery based feed in tariff. Meteorol Z 28(3):235–253. https://doi.org/10.1127/metz/2019/0925 We envision that different machine learning methods 9. Alonso-Montesinos J, Polo J, Ballestr´ ın J, Batlles FJ, Portillo C would benefit by their combination with SLMVP, and thus (2019) Impact of DNI forecasting on CSP tower plant power pro- it will be of interest to verify it, using this and other domain duction. Renew Energy 138(C):368–377. https://doi.org/10.1016/ datasets. j.renene.2019.01 10. Antonanzas J, Pozo-Vazquez ´ D, Fernand ez-Jimenez LA, Martinez-de-Pison FJ (2017) The value of day-ahead forecasting Acknowledgements This work has been made possible by projects for photovoltaics in the Spanish electricity market. Sol Energy funded by Agencia Estatal de Investigacion ´ (PID2019-107455RB- 158:140–146. https://doi.org/10.1016/j.solener.2017.09.043 C22 / AEI / 10.13039/501100011033). This work was also supported 11. Blanc P, Remund J, Vallance L (2017) Short-term solar power by the Comunidad de Madrid Excellence Program and Comunidad forecasting based on satellite images, pp 179–198. https://doi.org/ de Madrid-Universidad Politecnica ´ de Madrid young investigators 10.1016/B978-0-08-100504-0.00006-8 initiative. 12. Bright JM, Killinger S, Lingfors D, Engerer NA (2018) Improved satellite-derived pv power nowcasting using real-time power data Funding Open Access funding provided thanks to the CRUE-CSIC from reference pv systems. Sol Energy 168:118–139 agreement with Springer Nature. 13. Arbizu-Barrena C, Ruiz-Arias JA, Rodr´ ıguez-Ben´ ıtez FJ, Pozo- Vazquez ´ D, Tovar-Pescador J (2017) Short-term solar radiation Open Access This article is licensed under a Creative Commons forecasting by advecting and diffusing msg cloud index. Sol Attribution 4.0 International License, which permits use, sharing, Energy 155:1092–1103. https://doi.org/10.1016/j.solener.2017. adaptation, distribution and reproduction in any medium or format, as 14. Lopes FM, Silva HG, Salgado R, Cavaco A, Canhoto P, Collares- long as you give appropriate credit to the original author(s) and the Pereira M (2018) Short-term forecasts of ghi and dni for solar source, provide a link to the Creative Commons licence, and indicate energy systems operation: assessment of the ecmwf integrated if changes were made. The images or other third party material in this forecasting system in southern portugal. Sol Energy 170:14–30 article are included in the article’s Creative Commons licence, unless 15. Rodr´ ıguez-Ben´ ıtez FJ, Arbizu-Barrena C, Huertas-Tato J, Aler- indicated otherwise in a credit line to the material. If material is not Mur R, Galvan-Le ´ on ´ I, Pozo-Vazquez ´ D (2020) A short-term included in the article’s Creative Commons licence and your intended solar radiation forecasting system for the iberian peninsula. Part use is not permitted by statutory regulation or exceeds the permitted 1: Models description and performance assessment. Sol Energy use, you will need to obtain permission directly from the copyright 195:396–412. https://doi.org/10.1016/j.solener.2019.11.028 holder. To view a copy of this licence, visit http://creativecommons. 16. McCandless TC, Haupt SE, Young GS (2016) A regime- org/licenses/by/4.0/. dependent artificial neural network technique for short-range solar irradiance forecasting. Renew Energy 89(C):351–359. https://doi. org/10.1016/j.renene.2015.12 17. Lee JA, Haupt SE, Jimenez ´ PA, Rogers MA, Miller SD, McCan- References dless TC (2017) Solar irradiance Nowcasting case studies near sacramento. J Appl Meteorol Climatol 56(1):85–108. https://doi. 1. Yang D, Wang W, Gueymard CA, Hong T, Kleissl J, Huang J, org/10.1175/JAMC-D-16-0183.1 Perez MJ, Perez R, Bright JM, Xia X et al (2022) A review 18. Ahmed R, Sreeram V, Mishra Y, Arif M (2020) A review and of solar forecasting, its dependence on atmospheric sciences evaluation of the state-of-the-art in pv solar power forecasting: and implications for grid integration: Towards carbon neutrality. Techniques and optimization. Renew Sust Energ Rev 124: Renew Sust Energ Rev 161:112348 2. Haupt SE (2018) Short-range forecasting for energy. Springer, 19. Yang D, Wang W, Bright JM, Voyant C, Notton G, Zhang G, Berlin, pp 97–107. https://doi.org/10.1007/978-3-319-68418-5 7 Lyu C (2022) Verifying operational intra-day solar forecasts from 3. Sobri S, Koohi-Kamali S, Rahim NA (2018) Solar photovoltaic ecmwf and noaa. Sol Energy 236:743–755 generation forecasting methods: A review. Energy Convers Manag 20. Mellit A, Massi Pavan A, Ogliari E, Leva S, Lughi V (2020) 156:459–497 Advanced methods for photovoltaic output power forecasting: A 4. Yang D, Kleissl J, Gueymard CA, Pedro HTC, Coimbra CFM review. Appl Sci 10(2):487 (2018) History and trends in solar irradiance and pv power fore- 21. Markovics D, Mayer MJ (2022) Comparison of machine learning casting: A preliminary assessment and review using text mining. methods for photovoltaic power forecasting based on numerical Sol Energy 168:60–101. https://doi.org/10.1016/j.solener.2017. weather prediction. Renew Sust Energ Rev 161:112364 11.023 22. Salcedo-Sanz S, Cornejo-Bueno L, Prieto L, Paredes D, Garc´ ıa- 5. Singla P, Duhan M, Saroha S (2021) A comprehensive review Herrera R (2018) Feature selection in machine learning prediction and analysis of solar forecasting techniques. Frontiers in Energy, systems for renewable energy applications. Renew Sust Energ Rev pp 1–37 90:728–741 6. Litjens GBMA, Worrell E, van Sark WGJHM (2018) Assessment 23. Liu H, Chen C (2019) Data processing strategies in wind energy of forecasting methods on performance of photovoltaic-battery forecasting models and applications: A comprehensive review. systems. Appl Energy 221:358–373. https://doi.org/10.1016/j.ape Appl Energy 249:392–408 nergy.2018.03.154 24. Martin R, Aler R, Valls JM, Galvan ´ IM (2016) Machine learning 7. Aguera-P ¨ erez ´ A, Palomares-Salas JC, Gonzalez ´ de la Rosa JJ, techniques for daily solar energy prediction and interpolation Florencias-Oliveros O (2018) Weather forecasts for microgrid using numerical weather models. Concurr Comput Pract Exp energy management: Review, discussion and recommendations. 28(4):1261–1274 Supervised dimensionality reduction to forecast solar radiation 25. Wang Z, Wang W, Wang B (2017) Regional wind power 45. Friedman JH (2001) Greedy function approximation: a gradient forecasting model with nwp grid data optimized. Front Energy boosting machine. Annals of statistics, pp 1189–1232 11(2):175–183 46. Friedman JH (2002) Stochastic gradient boosting. Comput Stat 26. Andrade JR, Bessa RJ (2017) Improving renewable energy Data Anal 38(4):367–378 forecasting with a grid of numerical weather predictions. IEEE 47. Aler R, Galvan ´ IM, Ruiz-Arias JA, Gueymard CA (2017) Trans Sustain Energy 8(4):1571–1580 Improving the separation of direct and diffuse solar radiation 27. Higashiyama K, Fujimoto Y, Hayashi Y (2017) Feature extraction components using machine learning by gradient boosting. Sol of numerical weather prediction results toward reliable wind Energy 150:558–569 power prediction. In: 2017 IEEE PES Innovative smart grid 48. Wu J, Zhou T, Li T (2020) Detecting epileptic seizures technologies conference europe (ISGT-Europe), pp 1–6. IEEE in eeg signals with complementary ensemble empirical mode 28. Garc´ ıa-Hinde O, Terren-Serrano ´ G, Hombrados-Herrera M, decomposition and extreme gradient boosting. Entropy 22(2):140 Gomez-V ´ erdejo V, Jimenez-Fern ´ andez ´ S, Casanova-Mateo C, 49. Asante-Okyere S, Shen C, Ziggah YY, Rulegeya MM, Zhu X Sanz-Justo J, Mart´ ınez-Ramon ´ M, Salcedo-Sanz S (2018) (2019) A novel hybrid technique of integrating gradient-boosted Evaluation of dimensionality reduction methods applied to machine and clustering algorithms for lithology classification. numerical weather models for solar radiation forecasting. Eng Natural Resources Research, pp 1–17 Appl Artif Intell 69:157–167 29. Verbois H, Huva R, Rusydi A, Walsh W (2018) Solar irradiance forecasting in the tropics using numerical weather prediction and Publisher’s note Springer Nature remains neutral with regard to statistical learning. Sol Energy 162:265–277 jurisdictional claims in published maps and institutional affiliations. 30. Khan M, Liu T, Ullah F (2019) A new hybrid approach to forecast wind power for large scale wind turbine data using deep learning with tensorflow framework and principal component analysis. Esteban Garc´ıa is assistant Energies 12( 12):2229 professor with the Artifi- 31. Verbois H, Saint-Drenan Y-M, Thiery A, Blanc P (2022) cial Intelligence department Statistical learning for nwp post-processing: a benchmark for solar at Universidad Politecnica ´ irradiance forecasting. Sol Energy 238:132–149 de Madrid since 2001. He 32. Hotelling H (1933) Analysis of a complex of statistical variables received his PhD in Computer into principal components. J Educ Psychol 24(6):417–441. Science from Universidad https://doi.org/10.1037/h0071325 Carlos III de Madrid (Spain) 33. Fisher RA (1925) Theory of statistical estimation. Math Proc in 2010 and his MSc in Philos Soc 22:700–725 Computer Science from Uni- 34. Garc´ ıa-Cuesta E, Iglesias JA (2012) User modeling: through versidad Carlos III (2005). statistical analysis and subspace learning. Expert Syst Appl He has worked in European 39(5):5243–5250 and Spanish research projects, 35. McInnes L, Healy J, Saul N, Großberger L (2018) Umap: Uniform related to machine learning manifold approximation and projection. J Open Source Softw and artificial intelligence on 3(861) several application domains, 36. He X, Niyogi P (2003) Locality preserving projections. Advances including remote sensing, renewable energy forecasting, affective in neural information processing systems, p 16 computing, and medicine. His current research interests include 37. Vogelstein JT, Bridgeford EW, Tang M et al (2021) Supervised machine learning and explainable AI. He has been visiting scientist dimensionality reduction for big data. Nat Commun 12(2872). at Carnegie Mellon (Pittsburgh, US) and collaborates actively with https://doi.org/10.1038/s41467-021-23102-2 industry initiatives. 38. Chao G, Mao C, Wang F, Zhao Y, Luo Y (2018) Supervised nonnegative matrix factorization to predict icu mortality risk. In: 2018 IEEE International Conference on Bioinformatics and Ricardo Aler is associate pro- Biomedicine (BIBM), pp 1189–1194. IEEE fessor with the Computer Sci- 39. Chao G, Luo Y, Ding W (2019) Recent advances in supervised ence Department at Univer- dimension reduction: a survey. Mach Learn Knowl Extract sidad Carlos III de Madrid 1(1):341–358 since 2001. He received his 40. Tenenbaum JB, Silva VD, Langford JC (2000) A global geometric PhD in Computer Science framework for nonlinear dimensionality reduction. Science from Universidad Politecnica ´ 290(5500):2319–2323. https://doi.org/10.1126/science.290.5500. de Madrid (Spain) in 1999 and his MSc in Decision Sup- 41. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensional- port Systems from Sunderland ity reduction and data representation. Neural Comput 15(6):1373– University (UK) (1993). He 1396. https://doi.org/10.1162/089976603321780317 has worked in European and 42. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction Spanish projects, related to by locally linear embedding. Science 290(5500):2323–2326 evolutionary computation and 43. Weinberger KQ, Sha F, Saul LK (2004) Learning a kernel matrix machine learning on several for nonlinear dimensionality reduction. In: Proceedings of the application domains, including Twenty-first international conference on machine learning, p 106 telecommunications, robosoc- 44. Garc´ ıa-Cuesta E (2022) Supervised Local Maximum Variance cer, brain-computer interfaces, Preserving (SLMVP) Dimensionality Reduction Method (1.0). and renewable energy fore- https://doi.org/10.5281/zenodo.6856079, Online; Accessed 18 casting. His current research interests include Machine Learning, July 2022 Evolutionary Optimization, and the Energy Forecasting field. E. Garc´ıa-Cuesta et al. David Pozo-Vazquez ´ holds Ines ´ M. Galvan ´ is full pro- a Ph. D in Atmospheric Sci- fessor at the Computer ence (2000) from University Science and Engineering of Granada (Spain) and B.S. Department at Carlos III in Applied Physics from the University of Madrid since same University (1994). Since 2020. She received her PhD 1998 he is on the Faculty of degree in Computer Science the Department of Physics at Universidad Politecnica ´ of the University of Jaen, de Madrid (Spain), in 1998. where he is responsible for She has worked in several courses on meteorology and research European and Span- renewable energy resources ish projects related with and leads the Solar Radiation control of chemical reactors, and Atmosphere Modeling optimization and evolutionary research group. He obtained computation, and solar radi- a permanent appointment ation forecasting. Her current (Associated Professor) in 2003 and is full Professor since September research focuses on Machine 2018. In the last decade, his research focused on the solar and wind Learning Techniques, Evo- energy resources assessment, including their spatial and temporal lutionary Computation and balancing at different spatial and temporal scales. In addition, he Multi-objective algorithms. Her research interests also cover differ- conducted research aimed at the improvement of solar radiation fore- ent applications fields, as control of dynamic process, times series casting techniques at different spatial scales (plant, utility scale) and prediction, probabilistic forecasting, and renewable energy. forecasting lead times (from minutes to hours and days). He has been visiting scientist at the University of East Anglia, the European Center for Medium Range Weather Forecasting (both in the U.K.) and the National Center for Atmospheric Research (Boulder, CO, USA). Affiliations 1 2 3 2 Esteban Garc´ ıa-Cuesta · Ricardo Aler · David del Pozo-V ´ azquez ´ · Ines ´ M. Galvan ´ Ricardo Aler aler@inf.uc3m.es ´ ´ David del Pozo-Vazquez dpozo@ujaen.es ´ ´ Ines M. Galvan igalvan@inf.uc3m.es Departamento de Inteligencia Artificial, Universidad Politecnica ´ de Madrid, E.T.S.I.I Campus de Montegancedo s/n, Boadilla del Monte, 28660, Madrid, Spain Departamento de Informatica, ´ Universidad Carlos III de Madrid, Av. de la Universidad, 30, Leganes, ´ 28911, Madrid, Spain Departamento de F´ ısica, Universidad de Jaen, Campus Las Lagunillas, s/n, Jaen, 23071, Jaen, Spain http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Applied Intelligence Springer Journals

A combination of supervised dimensionality reduction and learning methods to forecast solar radiation

Loading next page...
 
/lp/springer-journals/a-combination-of-supervised-dimensionality-reduction-and-learning-mXawhawOBZ

References (49)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2022
ISSN
0924-669X
eISSN
1573-7497
DOI
10.1007/s10489-022-04175-y
Publisher site
See Article on Publisher Site

Abstract

Machine learning is routinely used to forecast solar radiation from inputs, which are forecasts of meteorological variables provided by numerical weather prediction (NWP) models, on a spatially distributed grid. However, the number of features resulting from these grids is usually large, especially if several vertical levels are included. Principal Components Analysis (PCA) is one of the simplest and most widely-used methods to extract features and reduce dimensionality in renewable energy forecasting, although this method has some limitations. First, it performs a global linear analysis, and second it is an unsupervised method. Locality Preserving Projection (LPP) overcomes the locality problem, and recently the Linear Optimal Low-Rank (LOL) method has extended Linear Discriminant Analysis (LDA) to be applicable when the number of features is larger than the number of samples. Supervised Nonnegative Matrix Factorization (SNMF) also achieves this goal extending the Nonnegative Matrix Factorization (NMF) framework to integrate the logistic regression loss function. In this article we try to overcome all these issues together by proposing a Supervised Local Maximum Variance Preserving (SLMVP) method, a supervised non-linear method for feature extraction and dimensionality reduction. PCA, LPP, LOL, SNMF and SLMVP have been compared on Global Horizontal Irradiance (GHI) and Direct Normal Irradiance (DNI) radiation data at two different Iberian locations: Seville and Lisbon. Results show that for both kinds of radiation (GHI and DNI) and the two locations, SLMVP produces smaller MAE errors than PCA, LPP, LOL, and SNMF, around 4.92% better for Seville and 3.12% for Lisbon. It has also been shown that, although SLMVP, PCA, and LPP benefit from using a non- linear regression method (Gradient Boosting in this work), this benefit is larger for PCA and LPP because SMLVP is able to perform non-linear transformations of inputs. Keywords Dimensionality reduction · Hybrid learning · Solar radiation forecast · Data mining 1 Introduction intermittent. Transient clouds and aerosol intermittency lead to considerable variability in the solar power plants Considerable efforts have been made in the past decades yield on a wide range of temporal scales, particularly to make solar energy a real alternative to the conventional in minutes to hours time scales. This presents serious energy generation system. There are two main technologies, issues regarding solar power plant management and their solar thermal electricity (STE) and solar photovoltaic (PV) yield integration into the electricity grid [1]. Currently, in energy, and many countries have already reached a notable addition to expensive storage-based solutions, the use of solar share in their energy mixes. Moreover, important solar radiation forecasts is the only plausible way to mitigate growth is expected in the near future (International Energy the intermittency. Therefore, the development of accurate Agency, 2018). solar radiation forecasting methods has become an essential Contrary to conventional generation, solar electricity research topic [2]. generation is conditioned by weather, and thus it is highly Solar forecasting methods can be classified depending on the forecasting horizon. Nowcasting methods are mostly related to one-hour ahead forecasts, short-term forecasting Esteban Garc´ ıa-Cuesta with up to 6 hours ahead forecasts and forecasting methods esteban.garcia@fi.upm.es are aimed at producing days ahead forecasts. The techniques associated with these methods are essentially different [3– Extended author information available on the last page of the article. 5]. In recent years, these has been increasing interest, E. Garc´ıa-Cuesta et al. particularly, in short-term forecasting, fostered by the to extract features from a NWP grid to improve renewable expected massive deployment of solar PV energy. Accurate energy forecasting. Advanced machine learning methods, short-term solar forecasts are important to ensure the quality such as convolutional neural networks, have also been used of the PV power delivered to the electricity network as a feature extraction scheme for wind power prediction and, thus, to reduce the ancillary costs [6, 7]. Short- using NWPs, showing competitive results compared to term forecasting has also been successfully used for the a PCA baseline [27]. Garcıa-Hinde et al. [28] presents management of STE plants [8, 9] and for the participation a study on feature selection and extraction methods for of PV and STE plants in the energy market [8, 10]. solar radiation forecasting. The study includes classical Short-term forecasts can be derived either from satellite methods, such as PCA or variance and correlation filters, imagery [11, 12] or from Numerical Weather Prediction and novel methods based on the adaptation of the support (NWP) models [13–15]. As solar radiation measured vector machines and deep Boltzmann machines for the task datasets have become progressively available, the use of of feature selection. Results show that one of the novel data-driven methods have become increasingly popular methods (the adaptation of support vector machine) and [16]. In [15, 17, 18] a comparison of the performance of PCA select high relevance features. Verbois et al. [29] different methods is assessed. combine feature extraction (PCA) and stepwise feature The use of NWP models for short-term solar forecasting selection of NWP variables for solar irradiance forecasting, has some important advantages, such as the global and comparing favorably with other benchmark methods. In easy availability of the forecasts. Because of that, this [30] a hybrid approach that combines PCA and deep approach was extensively evaluated during the past decade learning is presented to forecast wind power from hours [14, 15, 19]. Nevertheless, the reliability is far from optimal to years, showing a good performance. A recent study on and machine-learning methods play an important role in solar irradiance forecasting has compared many methods on providing enhanced solar forecasts derived from NWPs different datasets, where PCA has been used as the main models [20, 21]. In this context, the inputs for machine method for feature extraction and dimensionality reduction learning techniques are forecasts of several meteorological [31]. In general, it is observed that PCA, even in recent variables provided by numerical weather prediction (NWP) works, is one of the most widely-used methods to extract physical models such as the European Center for Medium features in renewable energy forecasting. Weather Forecasts (ECMWF) and the Global Ensemble PCA is a multivariate statistical analysis that transforms Forecast System (GEFS). Meteorological variables are a number of correlated variables into a smaller group of forecast for the points of a grid over the area of interest. uncorrelated variables called principal components [32]. However, the number of features resulting from these grids PCA has two main limitations. First, it performs a global is usually large, especially if several vertical levels are linear analysis by an axis transformation that best represents included in the grid. This may result in models that do not the mean and variance of the given data, but lacks the ability generalize well, and techniques to reduce the dimensionality to give local information representation. Second, PCA is an of data are required. unsupervised method, that is, the target output is not used Dimensionality reduction techniques can be divided into to extract the new features and this may be a drawback to feature selection and feature extraction. Feature selection finding the best low dimensional representation whenever methods select the most relevant variables in the grid, while labels are available. feature extraction summarizes information from the whole In this article we propose Supervised Local Maximum grid into fewer features. Both approaches have been used in Variance Preserving (SLMVP), a kernel method for the context of renewable energy forecasting with machine supervised feature extraction and dimensionality reduction. learning [22, 23]. Feature selection techniques have been The method considers both characteristics: it preserves used in [24] where methods such as Linear Correlation, the maximum local variance and distribution of the data, ReliefF, and Local Information Analysis have been explored but also considers the distribution of the data by the to study the influence on forecast accuracy of the number response variable to find an embedding that best represents of NWP grid nodes used as input for the solar forecasting the given data structure. This method can be applied to model. multiclass and regression problems when the sample size m In [25], feature extraction (PCA) is compared with is small and the dimensionality p is relatively large or very feature selection (a minimal redundancy and maximal- large as opposed to Fisher’s Linear Discriminant Analysis relevance method) to reduce the dimensionality of variables (LDA) [33], one of the foundational and most important in a grid for wind power forecasting in the east of China. approaches to classification. In summary, SLMVP uses the The authors conclude that PCA is a good choice to simplify full or partially labeled dataset to extract new features that the feature set, while obtaining competitive results. PCA maximize the variance of the embedding that best represents has also been used in [26] together with domain knowledge the common local distances [34] and computationally is Supervised dimensionality reduction to forecast solar radiation based on weighted graphs [35]. Additionally, this method described in Section 3 and the experimental design included is able to perform a linear and non-linear transformation of in Section 4. The Conclusions section summarizes the main the original space by using different kernels as the similarity results. metric. To validate the SLMVP method, it has been tested to extract features in order to improve solar radiation fore- 2 Supervised dimensionality reduction casting (both Global Horizon Irradiance (GHI) and Direct method: Kernel-SLMVP Normal Irradiance (DNI)) for a 3-hour forecasting horizon, and compared to PCA (the most popular workhorse in the As has been mentioned in Section 1, PCA is an unsuper- area), but also to other state-of-the-art methods that have vised method that performs a global analysis of the whole not been previously used in the context of solar radiation dataset. As opposed to the global-based data projection forecasting. These methods are (1) Locality Preserving Pro- techniques like PCA, other methods based on local struc- jection (LPP, an unsupervised local dimensionality reduc- ture preservation i.e. ISOMAP [40], LLP [36], Laplacian tion method) that finds linear projective maps that arise Eigenmaps [41], and Locally Linear Embedding [42]have by solving a variational problem that optimally preserves been proposed in order to overcome the characteristic of the neighborhood structure of the dataset [36]; (2) Lin- being global. Although these techniques use linear opti- ear Optimal Low-Rank (LOL, a supervised dimensionality mization solutions, they are also able to represent nonlinear reduction method) that learns a lower-dimensional repre- geometric features by local linear modeling representation sentation in a high-dimensional low sample size setting that lies in a low dimensional manifold [43]. Note that extending PCA by incorporating class-conditional moment these non-linear methods still do not consider labeled data, estimates into the low-dimensional projection [37], and (3) that is, they are unsupervised methods. Recently, Linear Supervised Non-negative Matrix Factorization (SNMF) that Optimal Low-Rank (LOL) projection has been proposed extends Negative Matrix Factorization (NMF) to be super- incorporating class-conditional means. The key intuition vised [38, 39]. SNMF integrates the logistic regression loss behind LOL is that it can jointly use the means and vari- function into the NMF framework and solves it with an ances from each class (like LDA), but without requiring alternating optimization procedure. All of these methods are more dimensions than samples [37]. Another recent method able to solve the “large p,small m” problem as opposed is Supervised Non-negative Matrix Factorization (SNMF) to many classical statistical approaches that were designed that extends Negative Matrix Factorization (NMF) to be with a “small p,large m” situation in mind (e.g. LDA). supervised [38]. SNMF integrates the logistic regression Features have been extracted from meteorological forecasts loss function into the NMF framework and solves it with (obtained from the GEFS) in points of a grid around two an alternating optimization procedure. For both methods, locations in the Iberian peninsula: Seville and Lisbon. Two regression can be done by projecting the data onto a lower- grid sizes have been tested, small and large. The perfor- dimensional subspace followed by the application of linear mance of SLMVP has been compared with PCA, LPP, LOL, or non-linear regression techniques. This mitigates the curse and SNMF using two different regressors, a linear one (stan- of high-dimensions. dard Linear Regression (LR)) and a non-linear technique The Supervised Local Maximum Variance Preserving (Gradient Boosting). Thus the main contributions of this (SLMVP) dimensionality reduction method solves the work are: problem of LPP to work on problems with “large p small m” and the global approach of LOL and SNMF, despite being A new local and supervised dimensionality reduction supervised. Therefore, SLMVP preserves the maximum method capable of solving the “large p,small m” problem. local variance of the data being able to represent non-linear The application of SLMVP to reduce the dimensionality properties, but also considers the output information (in a of the NWP variables in a grid for the solar radiation supervised mode) to preserve the local patterns between forecasting problem. inputs and outputs. In summary, it uses the full or partially The comparison with PCA, one of the most widely- labeled dataset to extract new features that best represent the used methods in the context of renewable energy for local maximum joint variance. feature extraction, LPP, and two state-of-the-art recent SLMVP is based on a graph representation for a given supervised methods, LOL and SNMF, showing the set of inputs x , x ,..., x ∈ , and a set of outputs 1 2 m usefulness of the proposed method. y , y ,..., y ∈ . With m being the sample data points 1 2 m The structure of the article is as follows. Section 2 and, p and l the number of input and output features, explains the SLMVP method, which is tested using the data in our case the dimensionality of l = 1and p = 342 E. Garc´ıa-Cuesta et al. for the small grid, and m = 12274 for the large grid. polynomial K(a, b) = (1 + a · b) or Gaussian K(a, b) = −|a−b| The application of any similarity function S on the inputs 2σ e ). Finally, the (3) can be solved as an eigenvector m×p m×l S (X) : X ∈ and S (Y) : Y ∈ defines x y problem on B as follows: an input weighted graph {H, U } and an output weighted graph {I, V } with H and I being the nodes, and U and V T XK K X B = λB (4) x y the vertex, respectively. The graphs are not constrained and can be fully connected, or some weights can have a zero where B is the learned latent space. The projection of the value meaning that the connection between those points input space data X on this space P = B X are the new disappears. The weight of the links represents the similarity extracted features to be used by the machine learning model. between two data points. These characteristics allow the The Python code of SLMVP has been released publicly method with the capability of being local. Following [41] at [44]. and [35] a graph embedding viewpoint can be used to reduce the dimensionality, mapping a weighted connected graph G = (V , E) to a line so that the connected points stay 3 Data description as close together as possible. The unsupervised dimensionality reduction problem The dataset used in this study concerns GHI and DNI T  k aims to choose the mapping y = A x : y ∈ and measurements at two radiometric solar stations in the i i k  p, which minimizes the distance with its neighbors in Iberian Peninsula: Seville and Lisbon. GHI and DNI have multidimensional data and can be expressed by the next cost been acquired with a Kipp & Zonen CMP6 pyranometer, function: with a 15-minute resolution. The set of inputs is a collection of forecasted meteorolog- ical variables obtained from GEFS at different levels of the J = y − y  w (1) ns ij i j atmosphere and at different latitudes and longitudes. More ij specifically, 9 meteorological variables at different levels are used (see Table 1), making a total of 38 attributes at each mxm where W ∈ is the similarity matrix S (X). latitude-longitude pair. Latitudes go from 32 to 51 and lon- Following this graph embedding approach, SLMVP gitudes go from -18 to 6 with a resolution of 0.5 degrees. solves the supervised version and the wish to choose the In this work, two grids of different sizes have been used: a T  k mapping y = A x : y ∈ and k  p, which minimizes i i small grid with 3 × 3 = 9 points around the solar station the distance with its neighbors in multidimensional data but (Seville and Lisbon) and a larger one with 17 × 19 = 323 preserves only those distances that are shared in the input points. For Seville, the larger grid covers the Iberian Penin- and output spaces, given the similarity functions for each of sula (latitudes: 36 to 44, longitudes: 350 to 359.5, both with them S and S . The cost function is then expressed by: x y a resolution of 0.5 degrees). In the case of the Lisbon solar station, the larger grid has been shifted to cover part of the Atlantic Ocean (latitudes also go from 36 to 44 and longi- J = y − y  z (2) s ij i j tudes go from 346 to 355.5). Figure 1 shows both the wide ij and narrow grids, centered around Seville and Lisbon (in m×m blue). Since each point in the grid contains 38 attributes, the where Z ∈ represents the joint similarity matrix small grid results in 3 × 3 × 38 = 342 input variables, and between input S (X) and output S (Y) similarity matrices, x y the larger one in 17 × 19 × 38 = 12274 inputs. being z = u v . Note the difference between (1) ij ik kj k=1 GEFS provides predictions of meteorological variables that is non supervised and (2) that defines a supervised for a 3-hour forecasting horizon every 6 hours each manifold learning problem using the similarity matrix day (00:00am, 06:00am, 12:00pm, and 18:00pm). The between inputs and outputs. corresponding GHI and DNI measurements are also used. The minimization of the cost function (2) can be To select the relevant hours of the day for GHI and DNI, expressed in its kernelization form (Kernel-SLMVP) samples with a zenithal angle larger than 75 degrees have after some transformations as the following maximization been removed. Given this restriction, data times range from problem: 9:15am to 6:00pm. The total input–output data covers from March 2015 to March 2017. max tr(Y K K Y ) (3) x y In this study, GHI and DNI are normalized by the irradiance of clear sky according to (5). where K = S (X) and K = S (Y) are the input and x x y y output similarity graphs expressed as kernel functions (i.e. I (t ) = I(t)/I (t ) (5) kt cs Supervised dimensionality reduction to forecast solar radiation Table 1 Meteorological Variable Description Levels Variables CLWMR Cloud mixing ratio 300, 350, 400, 450, 500, 550 mb 600, 650, 700, 750, 800, 850 mb 900, 925, 950, 975, 1000 mb HGT Geopotential Height 500, 850, 925, 1000 RH Relative humidity 500, 850, 925 UGRD U component of wind 500, 850, 925 mg VGRD V component of wind 500, 850, 925 SOILW Soil Temperature 0.0-0.1 m TMP 2-meter temperature 2 m CAPE Convective available potential energy surface, 255 PRMSL Pressure reduced to MSL surface where I(t) stands for GHI or DNI at time t and I (t ) is the In this Section, first the methodology employed is cs irradiance of clear sky at a particular at time t. described. Then, the results comparing SLMVP with PCA, LPP, LOL, and SNMF for different GEFS grid sizes will be presented. 4 Experimental validation 4.1 Methodology In order to study the performance of the SLMVP algorithm, it has to be combined with a regression method to predict Cross-validation (CV) has been applied to study the normalized GH I and DN I for a 3-hour forecasting horizon performance of SLMVP, PCA, LPP, LOL, and SNMF. and compared with the other above mentioned methods In standard CV, instances are distributed randomly into PCA, LPP, LOL, and SNMF. The regression technique CV partitions. But our study involves time series of data, uses as inputs the attributes/features from the input-space and therefore there are temporal dependencies between transformation obtained by the SLMVP, PCA, LPP, LOL, consecutive samples (in other words, consecutive samples and SNMF methods. As suggested in [37], to learn the can be highly correlated). Hence, in this study, group 4-fold projection matrix for the LOL method, we partition the data CV has been used, as explained next. Data has been split into K partitions (we select K = 10) equally separated into 4 groups, one for each week of every month. Fold 1 thus between the target variable range [0 − 1] to obtain a contains the first week of each month (January, February, K-class classification problem. In this work, linear and non- ...). Fold 2, the second week of every month and so on. linear regression methods have been tested. As a non-linear This guarantees that, at least training and testing partitions method, a state-of-the-art machine learning technique has will never contain instances belonging to the same week, been used: Gradient Boosting Regression (GBR) [45, 46]. which allows a more realistic analysis of the performance This technique has shown considerable success in predictive of the methods. Since in this work the optimal number of accuracy in recent years (see for instance [24, 47–49]). features must be selected, a validation set strategy has been Fig. 1 17 × 19 (black) and 44°N 44°N 3 × 3 (red) grids 42°N 42°N 40°N 40°N 38°N 38°N 36°N 36°N 34°N 34°N 32°N 32°N 15°W10°W5°W 0° 5°E 15°W10°W5°W 0° 5°E Longitude Longitude (a) Seville. (b) Lisbon. Latitude Latitude E. Garc´ıa-Cuesta et al. used. For this purpose, each training partition (that contains Given that 4-fold CV is used for performance evaluation, in 3 folds) is again divided into training and validation sets. each of the 4 CV iterations there is a training, validation, The validation set contains a week of each month out of the and testing partition. For each iteration, the regression three weeks of data available in the training partition. The models are trained with the training partition and then, the remaining two weeks (the ones not used for validation) are validation and test errors are obtained. The averages of the 4 used for training. iterations are obtained for the three errors (train, validation Mean Absolute Error (MAE) has been used as the and test). The validation error is used to select the optimal performance measure (6). Given that a 4-fold CV has been number of features. employed, results are the CV-average of MAE. SLMVP, SNMF, and GBR have some hyper-parameters that require tuning in order to improve results. Five hyper- i=N |y − o | i i i=1 2 MAE = (6) parameters were fitted: gamma parameter γ = that 2σ defines the Gaussian kernel function of the SLMVP method, where N is the number of samples and y and o are the i i α, β,and θ that defines the weight of each term of the SNMF actual value and the output of the model, respectively. Note method, and number of estimators and tree depth (which that the number of samples for training are 480, which is belongs to GBR). The following range of values for each smaller than the number of dimensions for the large grid hyper-parameter were tested: 480 << 12274 and within the same scale factor for the γ (SLMVP): from 0 to 2 in steps of 0.1 small grid 342 <≈ 480. α, β , θ (SNMF): from 0, 0.1, 0.01, 0.001 The performance of the methods are evaluated as follows. Number of estimators (GBR): from 10 to 200 in steps Recall that the number of the selected projected features of 10 is very relevant and the obtained features for the different Tree depth (GBR): from 1 to 10 in steps of 1 methods are also ordered by their importance. Then, in In order to tune the hyper-parameters, a systematic order to analyze the optimal number of dimensions, the performance of both linear and GBR regression methods is procedure known as grid-search was used. This method tries all possible combinations of hyper-parameter values. evaluated for 5, 10, 20, 50, 100 and 150 projected features. Table 2 Average test MAE Small-grid Large-grid and number of selected components for different Method MAE Components MAE Components methods and for GHI at Seville and Lisbon locations Seville SLMVP - LR 0.1673 20 0.1845 50 PCA - LR 0.8417 20 0.7929 20 LPP - LR 0.3655 3 >10 20 LOL - LR 0.1660 50 0.1949 50 SNMF - LR 0.1699 100 0.1890 50 SLMVP - GBR 0.1562 20 0.1653 50 PCA - GBR 0.1688 20 0.1808 50 LPP - GBR 0.2008 150 0.2605 20 LOL - GBR 0.1875 150 0.1813 100 SNMF - GBR 0.1653 50 0.1885 50 Lisbon SLMVP - LR 0.2035 50 0.2217 50 PCA - LR 0.7734 100 0.7706 10 LPP - LR >5 100 >10 10 LOL - LR 0.2029 50 0.2233 100 SNMF - LR 0.2055 50 0.2209 50 SLMVP - GBR 0.1974 20 0.2084 100 PCA - GBR 0.2008 10 0.2167 50 LPP - GBR 0.2269 150 0.2548 5 LOL -GBR 0.2272 100 0.2254 150 SNMF - GBR 0.2023 20 0.2278 100 The bold entries are the best model for each case (Seville GHI and Lisbon GHI) independently of the grid size (small or large) Supervised dimensionality reduction to forecast solar radiation Models for each hyper-parameter combination are trained SLMVP, although GBR obtains better errors than LR, the with the training partition and evaluated with the validation difference between linear and non-linear is smaller than for partition. The best combination on the validation set is PCA and LPP cases. For instance, observing the GHI results selected. for Seville (top of Table 2), it can be seen that for the small grid (top left), the difference between GBR and LR (when 4.2 Results using SLMVP) is only 0.1562 vs. 0.1673, and for the large- grid (top right), is 0.1653 vs. 0.1845. Similar differences Table 2 shows the average GHI MAE for the best number can be observed for GHI at Lisbon (bottom of Table 2). of components for different methods and grid sizes. Table 3 This is reasonable because SLMVP uses a non-linear kernel, displays the same information for DNI. The best number so even when using LR, some of the non-linearity of the of components has been selected using the MAE for the problem has been included by SLMVP feature extraction validation set. In all cases, it is observed that the use process. Conclusions for DNI (Table 3) follow a similar of the nonlinear regression technique (GBR) improves trend: PCA and LPP benefit more from using a non-linear considerably the errors for PCA and LPP, in some cases method (GBR) than SLMVP and SNMF, but LOL benefits for LOL and SNMF, and always minor improvements for more using a regularized linear regressor. LOL includes SLMVP. For instance, in the case of the small-grid for GHI linear class prior information about which is beneficial for in Seville (Table 2 top left), the use of GBR with PCA LR. improves the MAE considerably (from 0.6467 with LR to Analyzing the results depending on the size of grid (small 0.0126 with GBR accountable for a 6.8% improvement). vs. large), it is observed that the use of a large grid does Similar improvements for PCA, LPP, and LOL MAE can not result in better MAE values. The best errors are always be observed for the large-grid, from 0.6084 to 0.0155 obtained with the small grid in all cases of Tables 2 (GHI) accountable for a 8.51% improvement (Table 2 top right). and 3 (DNI). When the large grid is used, more components Lisbon GHI (Table 2 bottom) behaves in a similar way. For are used for SLMVP but, as already mentioned, this does SNMF the differences are almost nonexistent. In the case of not improve the results. Table 3 Average test MAE Small-grid Large-grid and number of selected components for different Method MAE Components MAE Components methods and for DNI at Seville and Lisbon locations Seville SLMVP - LR 0.2580 50 0.2787 50 PCA - LR 0.6628 5 0.6531 20 LPP - LR 0.9360 10 >10 10 LOL - LR 0.2534 50 0.2785 50 SNMF - LR 0.2598 100 0.2888 50 SLMVP - GBR 0.2446 20 0.2600 50 PCA - GBR 0.2536 20 0.2788 10 LPP - GBR 0.3017 150 0.3586 5 LOL -GBR 0.2900 100 0.2640 150 SNMF - GBR 0.2704 50 0.3021 20 Lisbon SLMVP - LR 0.2845 50 0.3076 100 PCA - LR 0.6048 100 0.6034 10 LPP - LR 3.3840 100 >10 10 LOL - LR 0.2873 20 0.3020 50 SNMF - LR 0.2896 100 0.3090 50 SLMVP - GBR 0.2732 50 0.2874 100 PCA - GBR 0.2855 10 0.3082 50 LPP - GBR 0.3214 100 0.3884 150 LOL -GBR 0.3278 100 0.3140 100 SNMF - GBR 0.2809 20 0.3228 150 The bold entries are the best model for each case (Seville DNI and Lisbon DNI) independently of the grid size (small or large) E. Garc´ıa-Cuesta et al. Table 4 Percentage GHI DNI improvement of SLMVP relative to PCA, LPP, LOL, and Method Small-grid Large-grid Small-grid Large-grid Avg. SNMF Seville PCA 8.07 % 9.34 % 3.68 % 7.20 % 6.68 % LPP 28.50 % 57.57 % 23.36 % 37.90 % 34.72 % LOL 6.24 % 7.14 % 3.60 % 1.52 % 3.95 % SNMF 5.82 % 13.98% 6.23% 11.07% 8.65% Lisbon PCA 1.73 % 3.98 % 4.50 % 7.23% 3.87 % LPP 14.96 % 22.27 % 17.62 % 35.14 % 21.31 % LOL 2.80 % 7.18 % 5.17 % 5.09 % 4.80 % SNMF 2.50 % 6.03% 2.81% 7.53% 4.23% Summarizing the results so far, for both irradiances, In order to visualize the relation between the number of GHI and DNI, and both locations (Seville and Lisbon), components and error, Fig. 2 shows the GHI validation and the best performance is always obtained with the SLMVP test MAE for the different number of components. This is method and the non-linear regression method (GBR). In done for SLMVP, PCA, LPP, LOL and SNMF using GBR as order to quantify this improvement better, Table 4 shows regressor, for Seville and Lisbon (top/botton, respectively), the percentage improvement of SLMVP relative to PCA, and for small and large grids (left/right, respectively). LPP, LOL, and SNMF for the best models (SLMVP+GBR, The same information is displayed in Fig. 3 for DNI. PCA+GBR, LPP+GBR, LOL+LR, and SNMF-GBR/LR). It is observed that the best number of PCA components In summary, it can be said that SLMVP offers results 4.92% is usually smaller than for other methods and that LPP better than LOL for Seville and around 3.99% than LOL and LOL usually benefit slightly with larger number of for Lisbon, 5.88% better than PCA for Seville and around components. 3.12% than PCA for Lisbon, 25.93% better than LPP for SNMF and SLMVP have similar behavior with the Seville and around 16.29% than LPP for Lisbon, 6.21% number of components with the optimal number being better than SNMF for Seville and around 2.82% for Lisbon. slightly smaller for SLMVP. In contrast to PCA and LPP, Fig. 2 Average MAE for GHI of SLMVP-GBR, PCA-GBR, LPP-GBR, LOL-GBR, and SNMF-GBR along the number of components (x-axis) for the small and large grids in Seville and Lisbon Supervised dimensionality reduction to forecast solar radiation Fig. 3 Average MAE for DNI of SLMVP-GBR, PCA-GBR, LPP-GBR, LOL-GBR, and SNMF-GBR along the number of components (x-axis) for the small and large grids in Seville and Lisbon the information and components found by SLMVP benefits a finite independent sample. But at least it can be seen the performance of the regression method. In Figs. 2 and 3 it that in all cases, using the validation error to determine the is observed that up to 20 components, the errors decrease in best number of components is a reliable way of achieving a all study cases. With a small grid, 20 components is the best reasonable test error. solution for all datasets except Lisbon DNI, which reached Figures 2 and 3 also show that the performance of the best solution with 50 components (see left part in Fig. 2 SLMVP is always better than PCA, LPP, LOL, and SNMF for GHI and for DNI left part of Fig. 3). When a large grid for every number of components (but for a few PCA is used, more than 20 components are generally beneficial, exceptions and one for SNMF ). In order to quantify these with 50 or 100 components being selected as the best improvements, Tables 5, 6, 7 and 8 show the percentage of options (50 components for Seville and 100 components for improvements of SLMVP over PCA, LPP, LOL and SNMF Lisbon, although 50 and 100 components perform similarly using the best results regression model for the different for both locations). In those figures, it is also observed that number of components used, respectively. The superiority although validation and test errors follow a similar trend, of SLMVP is clearly observed, but it is interesting to note it is not always the case that the best error in validation that when 5 components are used (and some cases with 10 corresponds to the best error in test. This should be expected components), either PCA is better or the improvement of because validation error is only an estimation obtained with SLMVP is smaller. This suggests that PCA is able to find Table 5 Improvements in percentage (%) of SLMVP over PCA for the different number of components Small-grid Large-grid Location 5 10 20 50 100 150 5 10 20 50 100 150 GHI Seville -0.47 4.37 6.80 17.49 9.41 4.43 0.58 -0.37 5.87 8.51 2.93 4.65 Lisbon 1.76 4.05 3.83 3.36 3.15 4.29 -1.99 0.76 1.11 4.89 8.11 6.68 DNI Seville -0.79 2.02 2.98 5.14 8.81 4.93 0.99 -2.14 0.63 5.46 6.02 5.39 Lisbon -0.84 1.32 1.08 6.64 8.67 8.10 1.35 0.34 1.28 5.14 10.70 9.89 E. Garc´ıa-Cuesta et al. Table 6 Improvements in percentage (%) of SLMVP over LPP for the different number of components Small-grid Large-grid Location 5 10 20 50 100 150 5 10 20 50 100 150 GHI Seville 9.66 23.53 28.06 22.46 22.34 14.28 26.14 28.11 50.14 57.16 50.47 49.52 Lisbon 20.91 27.89 20.13 16.80 13.87 12.51 11.75 15.65 19.28 28.31 22.69 20.68 DNI Seville 10.28 22.83 28.82 20.51 18.01 13.77 22.01 24.79 34.08 38.95 38.65 39.01 Lisbon 16.98 24.39 17.96 14.05 12.95 8.86 20.45 21.85 23.48 31.28 33.48 29.59 relevant information when only very few components are two cases have a p-value close to the 5% threshold being allowed. In any case, it is clear from Figs. 2 and 3 that more the p-value=0.08 and 0.1) and 8 out of 16 for Seville. These than 5 components are required in order to obtain the best insights suggest that the source data for both locations have results. different properties and Lisbon may contain more noisy Finally, to also verify that the SLMVP technique is data and therefore our method obtains larger improvements superior to the current use of PCA, LPP, LOL, and because of its noise tolerant characteristics introduced by SNMF not only for the optimal number of dimensions but the use of locality. independently of the number of dimensions selected, we have used a two-sample t-test for equal means to test the hypothesis that the obtained average error improvement 5 Conclusions for the different number of dimensions for each dataset is not due to chance. The obtained significance is shown in Using Machine Learning methods to forecast GHI or Table 9. We applied this test under the null hypothesis that DNI radiation, based on features that use NWP grids, the means are equal and the observations have different typically results in a large number of attributes. In this standard deviations. We used as observations the 6 test article, a supervised method for feature transformation error data results (5, 10, 20, 50, 100, and 150 extracted and reduction (SLMVP) has been proposed to extract components) obtained for each dataset . We conclude that the most relevant features solving the limitations of the improvement obtained for the analysis SLMVP vs. LPP PCA technique to represent locality, non-linear patterns, is significant, rejecting the hypothesis of equal means with and use labeled data. The PCA method is one of the a p-value always below < 0.001. Vs. PCA this p-value is most widely used methods to extract features and reduce below < 0.05 in 4 out of 8 cases (3 for Lisbon and 1 for dimensionality in renewable energy. Three other state-of- Seville), vs. LOL the p-value is below < 0.05 also in 4 out the-art dimensionality methods that include locality (LPP), of 8 cases (3 for Lisbon and 1 for Seville), and vs. SNMF the and supervision (LOL and SNMF) have been also compared p-value is below < 0.05 also in 4 out of 8 cases (2 for Lisbon with. and 2 for Seville) rejecting the hypothesis of equal means. The five methods have been tested and compared on In summary, we observed that for Lisbon, the null radiation data at two different Iberian locations: Seville hypothesis is rejected for 12 out of 16 cases (and the other and Lisbon. Both linear and non-linear (GBR) regression Table 7 Improvements in percentage (%) of SLMVP over LOL for the different number of components Small-grid Large-grid Location 5 10 20 50 100 150 5 10 20 50 100 150 GHI Seville 4.32 20.01 19.03 14.00 14.70 7.12 3.98 10.01 6.50 8.86 4.25 1.52 Lisbon 20.04 24.92 22.91 15.91 13.89 13.67 4.45 4.98 4.77 9.83 7.55 2.74 DNI Seville 6.76 17.72 20.71 18.85 14.59 10.21 3.28 6.67 5.95 8.21 5.43 1.18 Lisbon 13.62 16.69 20.80 21.68 15.12 17.03 12.42 11.74 9.30 9.19 8.30 6.70 Supervised dimensionality reduction to forecast solar radiation Table 8 Improvements in percentage (%) of SLMVP over SNMF for the different number of components Small-grid Large-grid Location 5 10 20 50 100 150 5 10 20 50 100 150 GHI Seville 2.23 8.44 6.35 1.22 4.13 3.23 7.72 11.30 6.10 9.86 10.74 8.58 Lisbon 7.86 5.36 2.40 -0.10 1.91 0.25 3.37 4.60 2.56 8.53 8.53 1.46 DNI Seville 0.79 2.02 2.98 5.14 8.81 4.93 5.24 7.50 10.03 14.87 11.99 9.44 Lisbon 0.84 1.32 1.08 6.64 8.67 8.10 60.2 5.07 8.20 12.91 15.67 9.12 methods have been used on the components extracted from DNI, 1.73% at Lisbon GHI, and around 4.50% at Lisbon SLMVP, PCA, LPP, LOL, and SNMF. DNI. Results show that for both types of radiation (GHI and Finally, although both SLMVP, PCA, and LPP benefit DNI) and both locations, SLMVP offers smaller MAE from using a non-linear regression method (GBR), this errors than the other methods. In order to assess the benefit is larger for PCA and LPP because they are not able influence of the size of the NWP grid, two sizes have to perform non-linear transformations. LOL does not benefit been tested, small and large. SLMVP results in better from non-linear regression and for some cases obtained radiation estimates, but the small size grids display slightly better results using the regularized linear regressor. SNMF better errors. It has also been shown that PCA tends to benefits slightly from non-linear regression for all but one underestimate the number of features required to obtain of the small grids, but not for large ones. Because SMLVP the best results. LPP obtains the worst results and this is able to use non-linear transformations, the difference is noticeable for large grids. SNMF has also shown a between using the linear and non-linear regression method degradation in its performance for the large grid compared is smaller as expected (but still present). with SLMVP. In summary, it can be said that the small grid We can conclude that SLMVP is a competitive method works better and the improvement of SLMVP over the other for dimensionality reduction in the context of solar radiation methods is about 6.24% at Seville GHI, 3.60% at Seville forecast using NWP variables beating PCA, which is Table 9 Dimensionality GHI DNI two-sample t-test analysis for equal means and 5, 10, 20, 50, Small-grid Large-grid Small-grid Large-grid and 150 dimensions Seville PCA t-value 1.74 2.59 1.07 1.29 p-value 0.11 <0.03 0.31 0.23 LPP t-value 8.19 10.20 5.38 14.36 p-value <0.001 <0.001 <0.001 <0.001 LOL t-value 2.25 2.56 1.55 1.7 p-value 0.07 <0.003 0.18 0.12 SNMF t-value 1.01 5.8 1.63 5.08 p-value 0.34 <0.001 0.13 <0.001 Lisbon PCA t-value 3.04 1.8 2.83 2.34 p-value <0.02 0.1 < 0.02 <0.05 LPP t-value 8.52 10.65 5.17 13.92 p-value <0.001 <0.001 <0.001 <0.001 LOL t-value 2.55 3.30 10.99 2.06 p-value <0.03 <0.02 <0.001 0.08 SNMF t-value 1.57 2.8 0.88 4.74 p-value 0.15 <0.02 0.4 <0.001 E. Garc´ıa-Cuesta et al. currently the most widely used, and LOL and SNMF which Appl Energy 228(C):265–278. https://doi.org/10.1016/j.apen ergy.2018.0 are two recent supervised dimensionality reduction state-of- 8. Dersch J, Schroedter-Homscheidt M, Gairaa K, Hanrieder N, the-art methods. Overall SLMVP also obtains better results Landelius T, Lindskog M, Muller ¨ SC, Ramirez Santigosa L, Sirch independently of the number of dimensions used, showing T, Wilbert S (2019) Impact of dni nowcasting on annual revenues its robustness. of csp plants for a time of delivery based feed in tariff. Meteorol Z 28(3):235–253. https://doi.org/10.1127/metz/2019/0925 We envision that different machine learning methods 9. Alonso-Montesinos J, Polo J, Ballestr´ ın J, Batlles FJ, Portillo C would benefit by their combination with SLMVP, and thus (2019) Impact of DNI forecasting on CSP tower plant power pro- it will be of interest to verify it, using this and other domain duction. Renew Energy 138(C):368–377. https://doi.org/10.1016/ datasets. j.renene.2019.01 10. Antonanzas J, Pozo-Vazquez ´ D, Fernand ez-Jimenez LA, Martinez-de-Pison FJ (2017) The value of day-ahead forecasting Acknowledgements This work has been made possible by projects for photovoltaics in the Spanish electricity market. Sol Energy funded by Agencia Estatal de Investigacion ´ (PID2019-107455RB- 158:140–146. https://doi.org/10.1016/j.solener.2017.09.043 C22 / AEI / 10.13039/501100011033). This work was also supported 11. Blanc P, Remund J, Vallance L (2017) Short-term solar power by the Comunidad de Madrid Excellence Program and Comunidad forecasting based on satellite images, pp 179–198. https://doi.org/ de Madrid-Universidad Politecnica ´ de Madrid young investigators 10.1016/B978-0-08-100504-0.00006-8 initiative. 12. Bright JM, Killinger S, Lingfors D, Engerer NA (2018) Improved satellite-derived pv power nowcasting using real-time power data Funding Open Access funding provided thanks to the CRUE-CSIC from reference pv systems. Sol Energy 168:118–139 agreement with Springer Nature. 13. Arbizu-Barrena C, Ruiz-Arias JA, Rodr´ ıguez-Ben´ ıtez FJ, Pozo- Vazquez ´ D, Tovar-Pescador J (2017) Short-term solar radiation Open Access This article is licensed under a Creative Commons forecasting by advecting and diffusing msg cloud index. Sol Attribution 4.0 International License, which permits use, sharing, Energy 155:1092–1103. https://doi.org/10.1016/j.solener.2017. adaptation, distribution and reproduction in any medium or format, as 14. Lopes FM, Silva HG, Salgado R, Cavaco A, Canhoto P, Collares- long as you give appropriate credit to the original author(s) and the Pereira M (2018) Short-term forecasts of ghi and dni for solar source, provide a link to the Creative Commons licence, and indicate energy systems operation: assessment of the ecmwf integrated if changes were made. The images or other third party material in this forecasting system in southern portugal. Sol Energy 170:14–30 article are included in the article’s Creative Commons licence, unless 15. Rodr´ ıguez-Ben´ ıtez FJ, Arbizu-Barrena C, Huertas-Tato J, Aler- indicated otherwise in a credit line to the material. If material is not Mur R, Galvan-Le ´ on ´ I, Pozo-Vazquez ´ D (2020) A short-term included in the article’s Creative Commons licence and your intended solar radiation forecasting system for the iberian peninsula. Part use is not permitted by statutory regulation or exceeds the permitted 1: Models description and performance assessment. Sol Energy use, you will need to obtain permission directly from the copyright 195:396–412. https://doi.org/10.1016/j.solener.2019.11.028 holder. To view a copy of this licence, visit http://creativecommons. 16. McCandless TC, Haupt SE, Young GS (2016) A regime- org/licenses/by/4.0/. dependent artificial neural network technique for short-range solar irradiance forecasting. Renew Energy 89(C):351–359. https://doi. org/10.1016/j.renene.2015.12 17. Lee JA, Haupt SE, Jimenez ´ PA, Rogers MA, Miller SD, McCan- References dless TC (2017) Solar irradiance Nowcasting case studies near sacramento. J Appl Meteorol Climatol 56(1):85–108. https://doi. 1. Yang D, Wang W, Gueymard CA, Hong T, Kleissl J, Huang J, org/10.1175/JAMC-D-16-0183.1 Perez MJ, Perez R, Bright JM, Xia X et al (2022) A review 18. Ahmed R, Sreeram V, Mishra Y, Arif M (2020) A review and of solar forecasting, its dependence on atmospheric sciences evaluation of the state-of-the-art in pv solar power forecasting: and implications for grid integration: Towards carbon neutrality. Techniques and optimization. Renew Sust Energ Rev 124: Renew Sust Energ Rev 161:112348 2. Haupt SE (2018) Short-range forecasting for energy. Springer, 19. Yang D, Wang W, Bright JM, Voyant C, Notton G, Zhang G, Berlin, pp 97–107. https://doi.org/10.1007/978-3-319-68418-5 7 Lyu C (2022) Verifying operational intra-day solar forecasts from 3. Sobri S, Koohi-Kamali S, Rahim NA (2018) Solar photovoltaic ecmwf and noaa. Sol Energy 236:743–755 generation forecasting methods: A review. Energy Convers Manag 20. Mellit A, Massi Pavan A, Ogliari E, Leva S, Lughi V (2020) 156:459–497 Advanced methods for photovoltaic output power forecasting: A 4. Yang D, Kleissl J, Gueymard CA, Pedro HTC, Coimbra CFM review. Appl Sci 10(2):487 (2018) History and trends in solar irradiance and pv power fore- 21. Markovics D, Mayer MJ (2022) Comparison of machine learning casting: A preliminary assessment and review using text mining. methods for photovoltaic power forecasting based on numerical Sol Energy 168:60–101. https://doi.org/10.1016/j.solener.2017. weather prediction. Renew Sust Energ Rev 161:112364 11.023 22. Salcedo-Sanz S, Cornejo-Bueno L, Prieto L, Paredes D, Garc´ ıa- 5. Singla P, Duhan M, Saroha S (2021) A comprehensive review Herrera R (2018) Feature selection in machine learning prediction and analysis of solar forecasting techniques. Frontiers in Energy, systems for renewable energy applications. Renew Sust Energ Rev pp 1–37 90:728–741 6. Litjens GBMA, Worrell E, van Sark WGJHM (2018) Assessment 23. Liu H, Chen C (2019) Data processing strategies in wind energy of forecasting methods on performance of photovoltaic-battery forecasting models and applications: A comprehensive review. systems. Appl Energy 221:358–373. https://doi.org/10.1016/j.ape Appl Energy 249:392–408 nergy.2018.03.154 24. Martin R, Aler R, Valls JM, Galvan ´ IM (2016) Machine learning 7. Aguera-P ¨ erez ´ A, Palomares-Salas JC, Gonzalez ´ de la Rosa JJ, techniques for daily solar energy prediction and interpolation Florencias-Oliveros O (2018) Weather forecasts for microgrid using numerical weather models. Concurr Comput Pract Exp energy management: Review, discussion and recommendations. 28(4):1261–1274 Supervised dimensionality reduction to forecast solar radiation 25. Wang Z, Wang W, Wang B (2017) Regional wind power 45. Friedman JH (2001) Greedy function approximation: a gradient forecasting model with nwp grid data optimized. Front Energy boosting machine. Annals of statistics, pp 1189–1232 11(2):175–183 46. Friedman JH (2002) Stochastic gradient boosting. Comput Stat 26. Andrade JR, Bessa RJ (2017) Improving renewable energy Data Anal 38(4):367–378 forecasting with a grid of numerical weather predictions. IEEE 47. Aler R, Galvan ´ IM, Ruiz-Arias JA, Gueymard CA (2017) Trans Sustain Energy 8(4):1571–1580 Improving the separation of direct and diffuse solar radiation 27. Higashiyama K, Fujimoto Y, Hayashi Y (2017) Feature extraction components using machine learning by gradient boosting. Sol of numerical weather prediction results toward reliable wind Energy 150:558–569 power prediction. In: 2017 IEEE PES Innovative smart grid 48. Wu J, Zhou T, Li T (2020) Detecting epileptic seizures technologies conference europe (ISGT-Europe), pp 1–6. IEEE in eeg signals with complementary ensemble empirical mode 28. Garc´ ıa-Hinde O, Terren-Serrano ´ G, Hombrados-Herrera M, decomposition and extreme gradient boosting. Entropy 22(2):140 Gomez-V ´ erdejo V, Jimenez-Fern ´ andez ´ S, Casanova-Mateo C, 49. Asante-Okyere S, Shen C, Ziggah YY, Rulegeya MM, Zhu X Sanz-Justo J, Mart´ ınez-Ramon ´ M, Salcedo-Sanz S (2018) (2019) A novel hybrid technique of integrating gradient-boosted Evaluation of dimensionality reduction methods applied to machine and clustering algorithms for lithology classification. numerical weather models for solar radiation forecasting. Eng Natural Resources Research, pp 1–17 Appl Artif Intell 69:157–167 29. Verbois H, Huva R, Rusydi A, Walsh W (2018) Solar irradiance forecasting in the tropics using numerical weather prediction and Publisher’s note Springer Nature remains neutral with regard to statistical learning. Sol Energy 162:265–277 jurisdictional claims in published maps and institutional affiliations. 30. Khan M, Liu T, Ullah F (2019) A new hybrid approach to forecast wind power for large scale wind turbine data using deep learning with tensorflow framework and principal component analysis. Esteban Garc´ıa is assistant Energies 12( 12):2229 professor with the Artifi- 31. Verbois H, Saint-Drenan Y-M, Thiery A, Blanc P (2022) cial Intelligence department Statistical learning for nwp post-processing: a benchmark for solar at Universidad Politecnica ´ irradiance forecasting. Sol Energy 238:132–149 de Madrid since 2001. He 32. Hotelling H (1933) Analysis of a complex of statistical variables received his PhD in Computer into principal components. J Educ Psychol 24(6):417–441. Science from Universidad https://doi.org/10.1037/h0071325 Carlos III de Madrid (Spain) 33. Fisher RA (1925) Theory of statistical estimation. Math Proc in 2010 and his MSc in Philos Soc 22:700–725 Computer Science from Uni- 34. Garc´ ıa-Cuesta E, Iglesias JA (2012) User modeling: through versidad Carlos III (2005). statistical analysis and subspace learning. Expert Syst Appl He has worked in European 39(5):5243–5250 and Spanish research projects, 35. McInnes L, Healy J, Saul N, Großberger L (2018) Umap: Uniform related to machine learning manifold approximation and projection. J Open Source Softw and artificial intelligence on 3(861) several application domains, 36. He X, Niyogi P (2003) Locality preserving projections. Advances including remote sensing, renewable energy forecasting, affective in neural information processing systems, p 16 computing, and medicine. His current research interests include 37. Vogelstein JT, Bridgeford EW, Tang M et al (2021) Supervised machine learning and explainable AI. He has been visiting scientist dimensionality reduction for big data. Nat Commun 12(2872). at Carnegie Mellon (Pittsburgh, US) and collaborates actively with https://doi.org/10.1038/s41467-021-23102-2 industry initiatives. 38. Chao G, Mao C, Wang F, Zhao Y, Luo Y (2018) Supervised nonnegative matrix factorization to predict icu mortality risk. In: 2018 IEEE International Conference on Bioinformatics and Ricardo Aler is associate pro- Biomedicine (BIBM), pp 1189–1194. IEEE fessor with the Computer Sci- 39. Chao G, Luo Y, Ding W (2019) Recent advances in supervised ence Department at Univer- dimension reduction: a survey. Mach Learn Knowl Extract sidad Carlos III de Madrid 1(1):341–358 since 2001. He received his 40. Tenenbaum JB, Silva VD, Langford JC (2000) A global geometric PhD in Computer Science framework for nonlinear dimensionality reduction. Science from Universidad Politecnica ´ 290(5500):2319–2323. https://doi.org/10.1126/science.290.5500. de Madrid (Spain) in 1999 and his MSc in Decision Sup- 41. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensional- port Systems from Sunderland ity reduction and data representation. Neural Comput 15(6):1373– University (UK) (1993). He 1396. https://doi.org/10.1162/089976603321780317 has worked in European and 42. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction Spanish projects, related to by locally linear embedding. Science 290(5500):2323–2326 evolutionary computation and 43. Weinberger KQ, Sha F, Saul LK (2004) Learning a kernel matrix machine learning on several for nonlinear dimensionality reduction. In: Proceedings of the application domains, including Twenty-first international conference on machine learning, p 106 telecommunications, robosoc- 44. Garc´ ıa-Cuesta E (2022) Supervised Local Maximum Variance cer, brain-computer interfaces, Preserving (SLMVP) Dimensionality Reduction Method (1.0). and renewable energy fore- https://doi.org/10.5281/zenodo.6856079, Online; Accessed 18 casting. His current research interests include Machine Learning, July 2022 Evolutionary Optimization, and the Energy Forecasting field. E. Garc´ıa-Cuesta et al. David Pozo-Vazquez ´ holds Ines ´ M. Galvan ´ is full pro- a Ph. D in Atmospheric Sci- fessor at the Computer ence (2000) from University Science and Engineering of Granada (Spain) and B.S. Department at Carlos III in Applied Physics from the University of Madrid since same University (1994). Since 2020. She received her PhD 1998 he is on the Faculty of degree in Computer Science the Department of Physics at Universidad Politecnica ´ of the University of Jaen, de Madrid (Spain), in 1998. where he is responsible for She has worked in several courses on meteorology and research European and Span- renewable energy resources ish projects related with and leads the Solar Radiation control of chemical reactors, and Atmosphere Modeling optimization and evolutionary research group. He obtained computation, and solar radi- a permanent appointment ation forecasting. Her current (Associated Professor) in 2003 and is full Professor since September research focuses on Machine 2018. In the last decade, his research focused on the solar and wind Learning Techniques, Evo- energy resources assessment, including their spatial and temporal lutionary Computation and balancing at different spatial and temporal scales. In addition, he Multi-objective algorithms. Her research interests also cover differ- conducted research aimed at the improvement of solar radiation fore- ent applications fields, as control of dynamic process, times series casting techniques at different spatial scales (plant, utility scale) and prediction, probabilistic forecasting, and renewable energy. forecasting lead times (from minutes to hours and days). He has been visiting scientist at the University of East Anglia, the European Center for Medium Range Weather Forecasting (both in the U.K.) and the National Center for Atmospheric Research (Boulder, CO, USA). Affiliations 1 2 3 2 Esteban Garc´ ıa-Cuesta · Ricardo Aler · David del Pozo-V ´ azquez ´ · Ines ´ M. Galvan ´ Ricardo Aler aler@inf.uc3m.es ´ ´ David del Pozo-Vazquez dpozo@ujaen.es ´ ´ Ines M. Galvan igalvan@inf.uc3m.es Departamento de Inteligencia Artificial, Universidad Politecnica ´ de Madrid, E.T.S.I.I Campus de Montegancedo s/n, Boadilla del Monte, 28660, Madrid, Spain Departamento de Informatica, ´ Universidad Carlos III de Madrid, Av. de la Universidad, 30, Leganes, ´ 28911, Madrid, Spain Departamento de F´ ısica, Universidad de Jaen, Campus Las Lagunillas, s/n, Jaen, 23071, Jaen, Spain

Journal

Applied IntelligenceSpringer Journals

Published: Jun 1, 2023

Keywords: Dimensionality reduction; Hybrid learning; Solar radiation forecast; Data mining

There are no references for this article.