Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Classification of Very High Resolution Aerial Photos Using Spectral-Spatial Convolutional Neural Networks

Classification of Very High Resolution Aerial Photos Using Spectral-Spatial Convolutional Neural... Hindawi Journal of Sensors Volume 2018, Article ID 7195432, 12 pages https://doi.org/10.1155/2018/7195432 Research Article Classification of Very High Resolution Aerial Photos Using Spectral-Spatial Convolutional Neural Networks Maher Ibrahim Sameen, Biswajeet Pradhan , and Omar Saud Aziz School of Systems, Management and Leadership, Faculty of Engineering and Information Technology, University of Technology Sydney, Building 11, Level 06, 81 Broadway, P.O. Box 123, Ultimo, NSW 2007, Australia Correspondence should be addressed to Biswajeet Pradhan; biswajeet24@gmail.com Received 3 March 2018; Revised 17 April 2018; Accepted 6 May 2018; Published 26 June 2018 Academic Editor: Paolo Bruschi Copyright © 2018 Maher Ibrahim Sameen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Classification of aerial photographs relying purely on spectral content is a challenging topic in remote sensing. A convolutional neural network (CNN) was developed to classify aerial photographs into seven land cover classes such as building, grassland, dense vegetation, waterbody, barren land, road, and shadow. The classifier utilized spectral and spatial contents of the data to maximize the accuracy of the classification process. CNN was trained from scratch with manually created ground truth samples. The architecture of the network comprised of a single convolution layer of 32 filters and a kernel size of 3 × 3, pooling size of 2 × 2, batch normalization, dropout, and a dense layer with Softmax activation. The design of the architecture and its hyperparameters were selected via sensitivity analysis and validation accuracy. The results showed that the proposed model could be effective for classifying the aerial photographs. The overall accuracy and Kappa coefficient of the best model were 0.973 and 0.967, respectively. In addition, the sensitivity analysis suggested that the use of dropout and batch normalization technique in CNN is essential to improve the generalization performance of the model. The CNN model without the techniques above achieved the worse performance, with an overall accuracy and Kappa of 0.932 and 0.922, respectively. This research shows that CNN-based models are robust for land cover classification using aerial photographs. However, the architecture and hyperparameters of these models should be carefully selected and optimized. 1. Introduction of contextual information from data via feature pooling from a local spatial neighborhood [3]. Classifying remote sensing data (especially orthophotos of There are several methods and algorithms that have been adopted by many researchers to efficiently classify a very three bands—red, green, blue (RGB)) with traditional methods is a challenge even though some methods in litera- high-resolution aerial photo and produce accurate land cover ture have produced excellent results [1, 2]. The main reason maps. Methods such as object-based image analysis (or behind is that remote sensing datasets have high intra- and OBIA) was mostly investigated because of its advantage in interclass variability and the amount of labeled data is much very high-resolution image processing via spectral and spa- smaller as compared to the total size of the dataset [3]. On the tial features. In a recent paper, Hsieh et al. [7] applied aerial other hand, the recent advances in deep learning methods photo classification by combining OBIA with decision tree like convolutional neural networks (CNNs) have shown using texture, shape, and spectral feature. Their results promising results in remote sensing image classification achieved an accuracy of 78.20% and a Kappa coefficient of especially hyperspectral image classification [4–6]. The 0.7597. Vogels et al. [8] combined OBIA with random forest advantages of deep learning methods include learning high- classification with texture, slope, shape, neighbor, and spec- order features from the data that are often useful than the tral information to produce classification maps for agricul- raw pixels for classifying the image into some predefined tural areas. They have tested their algorithm on two labels. Other advantages of these methods are spatial learning datasets, and the results showed the employed methodology 2 Journal of Sensors hyperparameters on the accuracy of land cover classifica- to be effective with accuracies of 90% and 96% for the two study areas, respectively. On the other hand, a novel model tion using aerial photos. The aim is to understand the was presented by Meng et al. [9], where they applied OBIA behaviours of the CNN model concerning its architecture to improve vegetation classification based on aerial photos design and hyperparameters to produce models with high and global positioning systems. Results illustrated a signifi- generalization capacity. cant improvement in classification accuracy that increased from 83.98% to 96.12% in overall accuracy and from 0.7806 3. Methodology to 0.947 in the Kappa value. Furthermore, Juel et al. [10] This section presents the dataset, preprocessing, and the showed that random forest with the use of a digital elevation methodology of the proposed CNN model including the net- model could achieve relatively high performance for vegeta- work architecture and training procedure. tion mapping. In a most recent paper, Wu et al. [2] developed a model based on a comparison between pixel-based decision 3.1. Dataset and Preprocessing tree and object-based SVM to classify aerial photos. The object-based support vector machine (SVM) had higher 3.1.1. Dataset. To implement the current research, a pilot accuracy than that of the pixel-based decision tree. Albert area was identified based on the diversity of the land et al. [11] developed classifiers based on conditional random cover of the area. The study area is located in Selangor, fields and pixel-based analysis to classify aerial photos. Their Malaysia (Figure 1). results showed that such techniques are beneficial for land 3.1.2. Preprocessing cover classes covering large, homogeneous areas. (1) Geometric Calibration. Since the orthophoto was cap- 2. Related Works tured by an airborne laser scanning (LiDAR) system, it was essential to calibrate it geometrically to correct the geometric The success of CNN in the fields like computer vision, errors. In this step, the data was corrected based on ground language modeling, and speech recognition has motivated control points (GCPs) collected from the field (Figure 2). the remote sensing scientists to apply it in image classifica- There were 34 GCPs identified from clearly identifiable tion. There are several works that have been done on CNN points (i.e., road intersections, corners, and power lines). for remote sensing image classification [12–15]. This section The geometric correction was done in ArcGIS 10.5 software. briefly explains some of these works highlighting their The steps of geometric correction included identification findings and their limitations. of transformation points in the orthophoto, application Sun et al. [16] proposed an automated model for feature of the least square transformation, and calculation of the extraction and classification with classification refinement accuracy of the process. The selected points were uniformly by combining random forest and CNN. Their combined distributed in the area. After that, the least square method model could perform well (86.9%) and obtained higher (Kardoulas et al., 1996) was applied to estimate the coeffi- accuracy than the single models. Akar [1] developed a cients, which are essential for the geometric transformation model based on rotation forest and OBIA to classify aerial process. After the least square solution, the polynomial equa- photos. Results were compared to gentle AdaBoost, and tions were used to solve for X, Y coordinates of GCPs and to their experiments suggested that their method performed determine the residuals and RMS errors between the source better than the other method with 92.52% and 91.29% X, Y coordinates and the retransformed X, Y coordinates. accuracies, respectively. Bergado et al. [17] developed deep learning algorithms based on CNN for aerial photo (2) Normalization. Since the aerial orthophotos have inte- classification in high-resolution urban areas. They used data ger digital values and initial weights of the CNN model from optical bands, digital surface models, and ground are randomly selected within 0-1, a z-score normalization truth maps. The results showed that CNN is very effective was applied to pixel values of the orthophotos to avoid in learning discriminative contextual features leading to abnormal gradients. This step is essential as it improves accurate classified maps and outperforming traditional the progress of the activation and the gradient descent classification methods based on the extraction of textural optimization (LeCun et al., 2012). features. Scott et al. [13] applied CNN to produce land cover maps from high-resolution images. Other researchers X/max − μ such as Cheng et al. [12] used CNN as a classification ′ X = , 1 algorithm for scene understanding from aerial imagery. Furthermore, Sherrah [14] and Yao et al. [15] used CNN where max is the maximum pixel value in the image, μ and σ for semantic classification of aerial images. are the mean and standard deviation of X/max, respectively, This research investigates the development of a CNN and X is normalized data. model with regularization techniques such as dropout and batch normalization for classifying aerial orthophotos 3.2. The Proposed Approach into general land cover classes (e.g., road, building, water- body, grassland, barren land, shadow, and dense vegetation). 3.2.1. Overview. An orthophoto is composed of m × n × d The main objective of the research is to run several experi- digital values, where m, n, and d are the image width, length, ments exploring the impacts of CNN architectures and and depth, respectively. The goal of a classification model is Journal of Sensors 3 Perlis Kedah Kelantan Trengganu Perak Pahang Selangor Negeri sembilan Melaka Johor 101°32′20″E 101°32′30″E 101°32′40″E 101°32′50″E 101°33′0″E 101°33′10″E 101°32′20″E 101°32′30″E 101°32′40″E 101°32′50″E 101°33′0″E 101°33′10″E 0 75 150 300 450 600 (Meter) Figure 1: The study area location map. 3°5′40″N 3°5′40″N 3°5′50″N 3°5′40″N 3°5′40″N 3°5′50″N 4 Journal of Sensors 101°32′40″E 101°32′20″E 101°33′0″E 101°32′20″E 101°32′40″E 101°33′0″E Road (7417) Dense vegetation (5538) Water body (7363) Shadow (475) Grassland (7038) Barren land (3681) Building (5109) 0 75 150 300 450 600 (Meter) Figure 2: The ground truth samples over the study area, which were manually selected for seven land cover classes, for example, road, water body, grassland, building, dense vegetation, shadow, and barren land. The number in the brackets indicates the number of pixels in each class. very high-resolution aerial orthophotos. The following sec- to assign a label to each pixel in the image given a set of train- ing examples with their ground truth labels. In general, the tions describe the proposed model and its components common classification methods utilize the spectral informa- including the basics of CNN, the network architecture, and tion (image pixels across different bands) to achieve that goal. the training methodology. In addition, some of other techniques such as object-based The pseudocode of the proposed classification model is presented in Algorithm 1. We developed the CNN model image analysis (OBIA) segment the input image into several homogeneous contiguous groups before classification. This in the current study by running several experiments with method uses additional features like spatial, shape, and tex- different configurations. Then, we designed the ultimate ture to boost the classification performance of the classifier. model with best hyperparameters and architecture based However, both the methods, pixel-based and OBIA have on some statistical accuracy metrics such as overall accu- several challenges like speckle noise in the first method and racy, Kappa index, and per-class accuracies. segmentation optimization in OBIA. Furthermore, both methods require careful feature engineering and band selec- 3.2.2. Basics of CNN. Convolutional neural networks (CNNs) tion to obtain high accuracy of classification. More recently, or ConvNets are a type of artificial neural networks that classification methods using image patches and deep learning simulate the human vision cortical system by using local algorithms have been proposed to overcome the above chal- receptive field and shared weights. It was introduced by lenges. Among the common methods is CNN. As a result, LeCun and his colleagues [18]. Figure 3 shows a typical this study has proposed a classification method that is based CNN with convolutional max pooling operations. CNN is on CNN and spectral-spatial feature learning for classifying suitable for analyzing images, videos, or data in the form of 3°5′40″N 3°5′40″N Journal of Sensors 5 Algorithm 1: CNN for orthophoto classification Input: RGB image (I) captured by the aerial remote sensing system, training/testing samples (D) Output: Land cover classification map with seven classes (O) I, D, O Preprocessing (Section 3.1.2): calibrate I using the available 34 GCPs normalize pixel values using Eq. 1 Classification (CNN) (Section 3.2.2 and Section 3.2.3): for Patch_x_axis: initialize sum = 0 for Patch_y_axis: calculate dot product(Patch, Filter) result_convolution (x, y) = Dot product for Patch_x_axis: for Patch_y_axis: calculate Max (Patch) result_maxpool (x, y) = Dot product update F = max(0, x) result_cnn_model = trained model Prediction: apply the trained model to the whole image and get O Mapping: get the results of prediction reshape the predicted values to the original image shape convert the array to image and write it on the hard disk Algorithm 1: The pseudocode of the proposed CNN developed for land cover mapping using aerial images. Convolution ReLU Pooling Figure 3: Illustration of typical layers of a CNN. n-dimensional arrays that have a spatial component. This location [19]. In addition, pooling layers (or subsampling) unique property makes them suitable for remote sensing are used to merge semantically similar features into one. image classification as well. A typical architecture of CNN The most common method of subsampling computes the consists of a series of layers such as convolution, pooling, maximum of a local patch of units in feature maps. Other fully connected (i.e., dense), and logistic regression/Softmax. pooling operations are averaging max pooling and stochastic However, additional layers like dropout and batch normali- pooling. In general, several convolutional and subsampling zation also can be added to avoid overfitting and improve layers are stacked, followed by dense layers and a Softmax the generalization of these models. The last layer depends or a logistic regression layer to predict the label of each pixel in the image. on the type of the problem, where for binary classification problems, a logistic regression (sigmoid) layer is often used. Instead, for multiclass classification problems, a Softmax 3.2.3. Network Architecture. The architecture of the CNN layer is used. Each layer has its operation and is aimed in model was built with a single convolutional layer followed these models. For example, the convolutional layers are by a max pooling operation, batch normalization, and two aimed at constructing feature maps via convolutional filters dense layer classifiers (Figure 4). This architecture yielded that can learn high-level features that allow taking advantage 3527 total parameters where 96 parameters are not trainable. of the image properties. The output of these layers then The convolutional kernels were kept as 3 × 3, and the pooling passes through a nonlinearity such as a ReLU (rectified linear size in the max pooling layer was kept at 2 × 2. Dropout was unit). Local groups of values in array data are often highly performed in the convolutional layer and the first dense layer correlated, and local statistics of images are invariant to with a drop probability of 0.5 to avoid overfitting. The 6 Journal of Sensors Dropout Softmax Convolution ReLU Pooling Batch Dense Input image normalization Flatten Figure 4: The architecture of the proposed CNN for aerial orthophoto classification. Table 1: The summary of the CNN model layers. CNN, θ is the parameters of Softmax layer, N is the number i i of samples, k is the number of land cover classes, y = y , Layer (type) Output shape Number of parameters i i y , … , y is the prediction vector geo by the Softmax classi- 2 k Input (None, 3, 7, 7) 0 fier (3), and y represents the possibility of the ith sample 2D convolution (None, 1, 5, 32) 2048 label being t and is computed by (3). Max pooling (None, 1, 2, 16) 0 Batch normalization (None, 1, 2, 16) 64 exp θ c Dropout (None, 1, 2, 16) 0 y = 3 〠 exp θ c Flatten (None, 32) 0 j=1 t Dense (None, 32) 1056 During back propagation, (4) are adapted to update W Batch normalization (None, 32) 128 and b in every layer, where λ is the momentum which help Dropout (None, 32) 0 accelerate SGD by adding a fraction of the update value of Dense (Softmax) (None, 7) 231 the past time step to the current update value, α is the learn- ing rate, ∇W and ∇b are the gradients of J · with respect to W and b, respectively, and t just stands for the number of minibatch of stochastic gradient descent (SGD) was set to 32 epoch during SGD: images. Under the framework of Keras with Tensorflow back- end, the whole process was run on a CPU Core i7 2.6 GHz and W = W − λV − α∇W, t+1 t t memory ram (RAM) of 16 GB. In the experiments, 60% of the b = b − λU − α∇b total samples were randomly chosen for training, and the rest t+1 t t were chosen for testing, and overall accuracy (OA), average 3.2.5. Evaluation. This study uses several statistical accuracy accuracy (AA), Kappa coefficient (κ), and per-class accuracy measures to evaluate different models and compare them (PA) are used to evaluate the performance of the CNN classi- under various experimental configurations. These metrics fication method (Congalton and Green, 2008). The summary are overall accuracy (OA), average accuracy (AA), per-class of the model’s layers is shown in Table 1. accuracy (PA), and Kappa index (κ). They are calculated 3.2.4. Training the Model. The CNN model was trained with using the following equations [20]: backpropagation algorithm and stochastic gradient descent 〠D ii (SGD). It uses the minibatch’s backpropagation error to OA = , approximate the error of all the training samples, which accelerates the cycle of the weight update with smaller back 〠 PA AA = , propagation error to speed up the convergence of the whole model. The optimization was run to reduce the loss function ij (J) (i.e., categorical cross entropy) of CNN expressed as the PA = , following: i m m N〠 D − 〠 R · C ij i j i,j=1 i,j=1 N k k = , i i 2 JX , W, b, θ = − 〠〠 1 y = t · y , 2 N − 〠 R · C i j t i,j=1 i=1 j=1 where ∑D is the total number of correctly classified pixels, ii where X is normalized features, W and b are parameters of N is total number of pixels in the error matrix, m is the Journal of Sensors 7 Model accuracy Model loss 1.00 1.2 0.95 1.0 0.90 0.85 0.8 0.80 0.6 0.75 0.70 0.4 0.65 0.2 0.60 0.55 0.0 0 20 40 60 80 100 0 20 40 60 80 100 Epoch Epoch Training Training Validation Validation (a) (b) Figure 5: Performance of the CNN model with optimum parameters set, (a) model accuracy and (b) model loss for 93 epochs (early stopping). Table 2: PA of the CNN model. classify almost all the classes with relatively high accuracy. The minimum accuracy was 0.894 for the shadow class. Class PA While examining the confusion matrix (Table 3), the Road 0.971 results indicate that several (~11) samples of this class Waterbody 0.944 were misclassified as dense vegetation affecting its PA. Grassland 0.972 The confusion matrix also shows that there were several Building 0.995 samples of water body class misclassified as grassland. Dense vegetation 0.999 4.1.2. CNN Model with Other Configurations. The CNN Shadow 0.894 model was also trained without dropout and batch nor- Barren land 0.980 malization to see their impacts on the accuracy of the classi- fication map. Table 4 summarizes the results of comparing CNN models with different configurations (i.e., CNN + drop- number of classes, D is the number of correctly classified ij out + batch normalization, CNN + dropout, CNN + batch pixels in row i (in the diagonal cell), R is the total number normalization, and CNN). The results suggest that the of pixels in row i, and C is the total number of pixels in j use of dropout and batch normalization could improve column j. the accuracy (OA, AA, and κ) of the classification by almost 4%. The use of batch normalization slightly per- formed better (OA = 0.964, AA = 0.956, κ = 0.961) than just 4. Experimental Results using dropout (OA = 0.958, AA = 0.956, κ = 0.954). Never- theless, the use of either dropout or batch normalization 4.1. Performance of the Proposed Model could improve the accuracy of the classification compared 4.1.1. CNN with Dropout and Batch Normalization. Figure 5 to not using any of these techniques with the CNN model. shows the accuracy performance of the CNN model with The CNN model without these techniques achieved the dropout and batch normalization for 93 epochs on both following accuracies: OA = 0.932, AA = 0.922, κ = 0.922 training and validation datasets. The increment in model indicating the importance of such regularization methods accuracy and reduction in model loss over time indicates for aerial orthophoto classification. The classified maps that the model has learned useful features to classify the produced by these methods are shown in Figure 6. Fur- image pixels into the different class labels. The fluctuations thermore, the performance plot (Figure 7) of the CNN in the accuracy from one epoch to another are because of model without dropout and batch normalization shows using dropout that yielded a slightly different model at that this model overfits the training data and performs worse when applied to new data. Overall, the experimental each epoch. The OA, AA, κ of this model on validation dataset was 0.973, 0.965, 0.967, respectively. In addition, results on both training and validation data sets infer that Table 2 shows the per-class accuracy (PA) achieved by the proposed CNN architecture is a robust and efficient the model. The results suggest that the CNN model could model, while the use of dropout and batch normalization Accuracy Loss 8 Journal of Sensors Table 3: The confusion matrix calculated for the CNN model. Road Waterbody Grassland Building Dense vegetation Shadow Barren land Road 1474 0 0 23 0 0 21 Water body 0 1463 85 0 0 1 0 Grassland 0 10 1323 027 0 0 Building 4 0 0 991 00 0 Dense vegetation 0 0 0 0 1070 10 Shadow 0 0 0 0 11 93 0 Barren land 6 0 0 8 0 0 716 regarding AA. Furthermore, performances of CNN with Table 4: Performance of CNN model with different configurations. different optimizers have been investigated, and the results indicated that “Adam” could be effective in training com- Model OA AA κ pared to other optimizers. The highest OA (0.975) and κ CNN + dropout + batch normalization 0.973 0.965 0.967 (0.970) were achieved by the CNN model that was trained CNN + dropout 0.958 0.956 0.954 with “Adam.” However, when the optimizer “Nadam” was CNN + batch normalization 0.964 0.956 0.961 used to train CNN, the model could achieve the highest CNN 0.932 0.922 0.922 AA (0.974). The worst performance of CNN (OA = 0.945, AA = 0.949, and κ = 0.934) was found to be when the model was trained with SGD. Moreover, the efficiency of techniques as regularization methods is essential to obtain CNN was compared with different batch sizes such as 4, high accuracy of classification for the entire area rather 8, 16, 32, and 64. The batch size of 32 was found the best than just predicting the labels of the training samples. considering OA (0.975) and κ (0.970), while the batch size of 64 achieved the highest AA (0.975). 4.2. Sensitivity Analysis. The performance of CNN while Another important parameter in the proposed CNN is the patch size, which is the local neighborhood area that classifying orthophotos is highly dependent on its architec- ture and hyperparameters. Thus, the sensitivity analysis forms with the size (n × n). The advantage of using could serve as an essential step in finding a good set of patch-based learning for orthophoto classification is parameters and architecture configurations in addition to sourced from the benefits of spectral and spatial informa- an understanding of the model behavior. Figure 8 shows tion of the data that can improve the accuracy compared to just using the individual pixels (only spectral informa- the impact of different parameters (e.g., number of convolutional filters, activation function, drop probability, tion). To understand this parameter and find its subopti- optimizer, batch size, and patch size) on the validation mal value, several experiments were conducted with accuracy of CNN. different patch sizes (n = 3, 5, 7, 9, 11, 13 ). The statistical For convolutional filters, the sensitivity analysis shows analysis in terms of model accuracy indicates that using that larger number of filters can lead to an increase in per- larger n yields higher accuracy (Figure 8). However, when formance. However, use of larger number of filters can analyzing the classification map visually, the use of larger increase training time and overfit the training data if the n reduces the spatial quality of the features in the classifi- model is not regularized properly. Thus, this parameter cation map (Figure 9). As a result, we considered n =7 as was set to 32 as an optimal setting and not exploring a an effective value for this parameter as it achieved rela- larger number of filters. With this configuration, the tively high accuracy measured by OA, AA, and κ as well model could achieve the following accuracies: OA = 0.956, as high spatial quality features. AA = 0.945, and κ = 0.947. In addition, this analysis shows that the activation function “ReLU” outperformed the 4.3. Training Time Analysis. The computing performance of other two functions (“Sigmoid” and “ELU”). By using this the CNN model was dependent on the use of dropout and activation, the CNN model could achieve an OA of 0.956 batch normalization layers in the network architecture in higher than the second best activation “Sigmoid” by addition to other hyperparameters such as a number of ~4.4%. ReLU also facilitates faster training and reduced convolutional filters and image patch size. Table 5 shows likelihood of vanishing gradient. The experiments on drop the training time of the CNN model with different config- probability showed that different parametric values can urations. When early stopping was applied, the training of improve the performance of CNN depending on the accu- CNN with dropout and batch normalization took about racy metric. For example, results showed that the use of 124 seconds on a CPU. Removing the batch normalization drop probability as 0.2 could optimize the model for OA from the architecture yielded a training time of 150 sec- and κ, where the model achieved an OA and κ of 0.975, onds, whereas CNN with dropout took 75 seconds to be 0.970, respectively. However, drop probability of 0.3 could trained. The CNN model without the use of dropout and perform better than the value of 0.2 for this parameter batch normalization took the shortest time (58.4 seconds) Journal of Sensors 9 Road Road Dense vegetation Dense vegetation Water body Shadow Water body Shadow Barren land Barren land Grassland Grassland Building Building (a) (b) Road Dense vegetation Road Dense vegetation Water body Shadow Water body Shadow Grassland Barren land Barren land Grassland Building Building (c) (d) Figure 6: Classification maps produced by CNN models, (a) CNN + dropout + batch normalization, (b) CNN + dropout, (c) CNN + batch normalization, and (d) CNN. to be trained. On the other hand, when the model was Model loss trained with 200 epochs without early stopping, the model 1.2 (CNN + dropout + batch normalization) took about 230 seconds longer than that with early stopping by 106 sec- 1.0 onds. In addition, the other models (CNN + dropout, 0.8 CNN + batch normalization, and CNN) were also required a longer time to train as it was expected due to more 0.6 number of epochs run. Overall, the computing perfor- mance of the proposed model is efficient for the investi- 0.4 gated data. However, for larger datasets, the training of such models will require longer time, and as a result, 0.2 graphical processing units will be essential. 0.0 5. Conclusion 0 10 20 30 40 50 Epoch In this paper, a classification model based on CNN and Training spectral-spatial feature learning has been proposed for aerial Validation photographs. With the utilization of advanced regularization techniques such as dropout and batch normalization, the Figure 7: The loss of the CNN model without dropout and batch proposed model could balance generalization ability and normalization. training efficiency. Use of such methods to improve the CNN model along with other techniques like preprocessing Loss 10 Journal of Sensors 0.8 0.6 Overall Average Kappa 0.8 accuracy accuracy Overall Average Kappa 0.874 0.874 0.848 accuracy accuracy 8 0.918 0.931 0.902 Sigmoid 0.912 0.903 0.893 0.945 ReLU 0.956 0.945 0.947 16 0.942 0.931 0.945 32 0.956 0.947 ELU 0.897 0.911 0.876 Number of convolutional filters Activation 0.8 Overall Average Kappa accuracy accuracy 0.8 Overall Average Adam 0.975 0.961 0.97 Kappa accuracy accuracy Nadam 0.971 0.974 0.965 0.2 0.975 0.961 0.97 SGD 0.945 0.949 0.934 0.3 0.954 0.964 0.945 Adamax 0.97 0.964 0.963 0.4 0.954 0.962 0.959 Adadelta 0.964 0.96 0.956 0.5 0.956 0.963 0.951 Adagrad 0.951 0.951 0.941 0.946 0.6 0.955 0.953 RMSProp 0.972 0.965 0.966 Drop probability Optimizer 1 1 0.8 Overall Average 0.8 Kappa accuracy accuracy Overall Average Kappa 3 × 3 accuracy accuracy 0.93 0.927 0.918 5 × 5 4 0.951 0.946 0.941 0.967 0.972 0.963 8 7 × 7 0.964 0.966 0.957 0.961 0.968 0.955 9 × 9 16 0.964 0.955 0.956 0.977 0.98 0.974 32 0.975 0.961 0.97 11 × 11 0.977 0.977 0.975 64 0.972 0.975 0.967 13 × 13 0.985 0.986 0.984 Batch size Patch size Figure 8: The influence of hyperparameters, the number of convolutional filters, activation function, drop probability, optimizer, batch size, and patch size. (geometric calibration and feature normalization) and sensi- outperforming the traditional CNN model by ~4% in all tivity analysis could make these models robust for classifying the accuracy indicators. The short training time (124 sec- the given dataset. The CNN model acts as a feature extractor, onds) confirmed the robustness of the proposed model for and a classifier could be trained end-to-end given training small and medium scale remote sensing datasets. The future samples. The network architecture can effectively handle work should focus on scaling this architecture for large the inter- and intraclass complexity inside the scene. The best remote sensing datasets and other data sources such as satel- model achieved OA = 0.973, AA = 0.965, and κ = 0.967 lite images and laser scanning point clouds. Validation accuracy Validation accuracy Validation accuracy Validation Validation Validation accuracy accuracy accuracy Journal of Sensors 11 3 × 3 5 × 5 7 × 7 9 × 9 11 × 11 13 × 13 Figure 9: Effects of patch size on the quality of classified maps. Table 5: The training time in seconds of CNN with different [3] S. Basu, S. Ganguly, S. Mukhopadhyay, R. DiBiano, M. Karki, configurations for 200 epochs. and R. Nemani, “DeepSat: a learning framework for satellite imagery,” in Proceedings of the 23rd SIGSPATIAL Interna- Time tional Conference on Advances in Geographic Information Time (seconds)—with Model (seconds)—full Systems, p. 37, New York, NY, USA, November 2015. early stopping training [4] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning- CNN + dropout based classification of hyperspectral data,” IEEE Journal of 124 230 + batch normalization Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 6, pp. 2094–2107, 2014. CNN + dropout 150 168 [5] W. Li, G. Wu, F. Zhang, and Q. Du, “Hyperspectral image CNN + batch 75 219 classification using deep pixel-pair features,” IEEE Transac- normalization tions on Geoscience and Remote Sensing, vol. 55, no. 2, CNN 58.4 158 pp. 844–853, 2017. [6] W. Zhao and S. Du, “Spectral–spatial feature extraction for hyperspectral image classification: a dimension reduction Data Availability and deep learning approach,” IEEE Transactions on Geosci- ence and Remote Sensing, vol. 54, no. 8, pp. 4544–4554, These data were used from a research project lead by Profes- sor Biswajeet Pradhan. Very high resolution aerial photos [7] Y. T. Hsieh, C. T. Chen, and J. C. Chen, “Applying object- were used in this research. The data can be made available based image analysis and knowledge-based classification to upon request. ADS-40 digital aerial photographs to facilitate complex forest land cover classification,” Journal of Applied Remote Sensing, vol. 11, no. 1, article 015001, 2017. Conflicts of Interest [8] M. F. A. Vogels, S. M. De Jong, G. Sterk, and E. A. Addink, “Agricultural cropland mapping using black-and-white aerial The authors declare that they have no conflicts of interest. photography, object-based image analysis, and random for- ests,” International Journal of Applied Earth Observation and Geoinformation, vol. 54, pp. 114–123, 2017. References [9] X. Meng, N. Shang, X. Zhang et al., “Photogrammetric UAV [1] Ö. Akar, “The Rotation Forest algorithm and object-based mapping of terrain under dense coastal vegetation: an object- classification method for land use mapping through UAV oriented classification ensemble algorithm for classification images,” Geocarto International, vol. 33, no. 5, pp. 538–553, and terrain correction,” Remote Sensing, vol. 9, no. 11, p. 1187, 2017. 2017. [2] Q. Wu, R. Zhong, W. Zhao, H. Fu, and K. Song, “A [10] A. Juel, G. B. Groom, J. C. Svenning, and R. Ejrnaes, “Spatial comparison of pixel-based decision tree and object-based application of random forest models for fine-scale coastal veg- Support Vector Machine methods for land-cover etation classification using object based analysis of aerial classification based on aerial images and airborne lidar data,” orthophoto and DEM data,” International Journal of Applied International Journal of Remote Sensing, vol. 38, no. 23, Earth Observation and Geoinformation, vol. 42, pp. 106–114, pp. 7176–7195, 2017. 2015. 12 Journal of Sensors [11] L. Albert, F. Rottensteiner, and C. Heipke, “A higher order conditional random field model for simultaneous classification of land cover and land use,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 130, pp. 63–80, 2017. [12] G. Cheng, C. Ma, P. Zhou, X. Yao, and J. Han, “Scene clas- sification of high resolution remote sensing images using convolutional neural networks,” in 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 767–770, Beijing, China, July 2016. [13] G. J. Scott, M. R. England, W. A. Starms, R. A. Marcum, and C. H. Davis, “Training deep convolutional neural networks for land-cover classification of high-resolution imagery,” IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 4, pp. 549– 553, 2017. [14] J. Sherrah, “Fully convolutional networks for dense semantic labelling of high-resolution aerial imagery,” 2016, http://arxi- v.org/abs/1606.02585. [15] W. Yao, P. Poleswki, and P. Krzystek, “Classification of urban aerial data based on pixel labelling with deep convolutional neural networks and logistic regression,” ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XLI-B7, pp. 405–410, 2016. [16] X. Sun, X. Lin, S. Shen, and Z. Hu, “High-resolution remote sensing data classification over urban areas using random for- est ensemble and fully connected conditional random field,” ISPRS International Journal of Geo-Information, vol. 6, no. 8, p. 245, 2017. [17] J. R. Bergado, C. Persello, and C. Gevaert, “A deep learning approach to the classification of sub-decimetre resolution aerial images,” in 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 1516–1519, Beijing, China, July 2016. [18] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient- based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [19] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. [20] A. Bogoliubova and P. Tymków, “Accuracy assessment of automatic image processing for land cover classification of St. Petersburg protected area,” Acta Scientiarum Polonorum. Geodesia et Descriptio Terrarum, vol. 13, no. 1-2, 2014. International Journal of Advances in Rotating Machinery Multimedia Journal of The Scientific Journal of Engineering World Journal Sensors Hindawi Hindawi Publishing Corporation Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 http://www www.hindawi.com .hindawi.com V Volume 2018 olume 2013 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Journal of Control Science and Engineering Advances in Civil Engineering Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Submit your manuscripts at www.hindawi.com Journal of Journal of Electrical and Computer Robotics Engineering Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 VLSI Design Advances in OptoElectronics International Journal of Modelling & Aerospace International Journal of Simulation Navigation and in Engineering Engineering Observation Hindawi Hindawi Hindawi Hindawi Volume 2018 Volume 2018 Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com www.hindawi.com www.hindawi.com Volume 2018 International Journal of Active and Passive International Journal of Antennas and Advances in Chemical Engineering Propagation Electronic Components Shock and Vibration Acoustics and Vibration Hindawi Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Sensors Hindawi Publishing Corporation

Classification of Very High Resolution Aerial Photos Using Spectral-Spatial Convolutional Neural Networks

Loading next page...
 
/lp/hindawi-publishing-corporation/classification-of-very-high-resolution-aerial-photos-using-spectral-LXZCZ5HiT1

References (24)

Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 2018 Maher Ibrahim Sameen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN
1687-725X
eISSN
1687-7268
DOI
10.1155/2018/7195432
Publisher site
See Article on Publisher Site

Abstract

Hindawi Journal of Sensors Volume 2018, Article ID 7195432, 12 pages https://doi.org/10.1155/2018/7195432 Research Article Classification of Very High Resolution Aerial Photos Using Spectral-Spatial Convolutional Neural Networks Maher Ibrahim Sameen, Biswajeet Pradhan , and Omar Saud Aziz School of Systems, Management and Leadership, Faculty of Engineering and Information Technology, University of Technology Sydney, Building 11, Level 06, 81 Broadway, P.O. Box 123, Ultimo, NSW 2007, Australia Correspondence should be addressed to Biswajeet Pradhan; biswajeet24@gmail.com Received 3 March 2018; Revised 17 April 2018; Accepted 6 May 2018; Published 26 June 2018 Academic Editor: Paolo Bruschi Copyright © 2018 Maher Ibrahim Sameen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Classification of aerial photographs relying purely on spectral content is a challenging topic in remote sensing. A convolutional neural network (CNN) was developed to classify aerial photographs into seven land cover classes such as building, grassland, dense vegetation, waterbody, barren land, road, and shadow. The classifier utilized spectral and spatial contents of the data to maximize the accuracy of the classification process. CNN was trained from scratch with manually created ground truth samples. The architecture of the network comprised of a single convolution layer of 32 filters and a kernel size of 3 × 3, pooling size of 2 × 2, batch normalization, dropout, and a dense layer with Softmax activation. The design of the architecture and its hyperparameters were selected via sensitivity analysis and validation accuracy. The results showed that the proposed model could be effective for classifying the aerial photographs. The overall accuracy and Kappa coefficient of the best model were 0.973 and 0.967, respectively. In addition, the sensitivity analysis suggested that the use of dropout and batch normalization technique in CNN is essential to improve the generalization performance of the model. The CNN model without the techniques above achieved the worse performance, with an overall accuracy and Kappa of 0.932 and 0.922, respectively. This research shows that CNN-based models are robust for land cover classification using aerial photographs. However, the architecture and hyperparameters of these models should be carefully selected and optimized. 1. Introduction of contextual information from data via feature pooling from a local spatial neighborhood [3]. Classifying remote sensing data (especially orthophotos of There are several methods and algorithms that have been adopted by many researchers to efficiently classify a very three bands—red, green, blue (RGB)) with traditional methods is a challenge even though some methods in litera- high-resolution aerial photo and produce accurate land cover ture have produced excellent results [1, 2]. The main reason maps. Methods such as object-based image analysis (or behind is that remote sensing datasets have high intra- and OBIA) was mostly investigated because of its advantage in interclass variability and the amount of labeled data is much very high-resolution image processing via spectral and spa- smaller as compared to the total size of the dataset [3]. On the tial features. In a recent paper, Hsieh et al. [7] applied aerial other hand, the recent advances in deep learning methods photo classification by combining OBIA with decision tree like convolutional neural networks (CNNs) have shown using texture, shape, and spectral feature. Their results promising results in remote sensing image classification achieved an accuracy of 78.20% and a Kappa coefficient of especially hyperspectral image classification [4–6]. The 0.7597. Vogels et al. [8] combined OBIA with random forest advantages of deep learning methods include learning high- classification with texture, slope, shape, neighbor, and spec- order features from the data that are often useful than the tral information to produce classification maps for agricul- raw pixels for classifying the image into some predefined tural areas. They have tested their algorithm on two labels. Other advantages of these methods are spatial learning datasets, and the results showed the employed methodology 2 Journal of Sensors hyperparameters on the accuracy of land cover classifica- to be effective with accuracies of 90% and 96% for the two study areas, respectively. On the other hand, a novel model tion using aerial photos. The aim is to understand the was presented by Meng et al. [9], where they applied OBIA behaviours of the CNN model concerning its architecture to improve vegetation classification based on aerial photos design and hyperparameters to produce models with high and global positioning systems. Results illustrated a signifi- generalization capacity. cant improvement in classification accuracy that increased from 83.98% to 96.12% in overall accuracy and from 0.7806 3. Methodology to 0.947 in the Kappa value. Furthermore, Juel et al. [10] This section presents the dataset, preprocessing, and the showed that random forest with the use of a digital elevation methodology of the proposed CNN model including the net- model could achieve relatively high performance for vegeta- work architecture and training procedure. tion mapping. In a most recent paper, Wu et al. [2] developed a model based on a comparison between pixel-based decision 3.1. Dataset and Preprocessing tree and object-based SVM to classify aerial photos. The object-based support vector machine (SVM) had higher 3.1.1. Dataset. To implement the current research, a pilot accuracy than that of the pixel-based decision tree. Albert area was identified based on the diversity of the land et al. [11] developed classifiers based on conditional random cover of the area. The study area is located in Selangor, fields and pixel-based analysis to classify aerial photos. Their Malaysia (Figure 1). results showed that such techniques are beneficial for land 3.1.2. Preprocessing cover classes covering large, homogeneous areas. (1) Geometric Calibration. Since the orthophoto was cap- 2. Related Works tured by an airborne laser scanning (LiDAR) system, it was essential to calibrate it geometrically to correct the geometric The success of CNN in the fields like computer vision, errors. In this step, the data was corrected based on ground language modeling, and speech recognition has motivated control points (GCPs) collected from the field (Figure 2). the remote sensing scientists to apply it in image classifica- There were 34 GCPs identified from clearly identifiable tion. There are several works that have been done on CNN points (i.e., road intersections, corners, and power lines). for remote sensing image classification [12–15]. This section The geometric correction was done in ArcGIS 10.5 software. briefly explains some of these works highlighting their The steps of geometric correction included identification findings and their limitations. of transformation points in the orthophoto, application Sun et al. [16] proposed an automated model for feature of the least square transformation, and calculation of the extraction and classification with classification refinement accuracy of the process. The selected points were uniformly by combining random forest and CNN. Their combined distributed in the area. After that, the least square method model could perform well (86.9%) and obtained higher (Kardoulas et al., 1996) was applied to estimate the coeffi- accuracy than the single models. Akar [1] developed a cients, which are essential for the geometric transformation model based on rotation forest and OBIA to classify aerial process. After the least square solution, the polynomial equa- photos. Results were compared to gentle AdaBoost, and tions were used to solve for X, Y coordinates of GCPs and to their experiments suggested that their method performed determine the residuals and RMS errors between the source better than the other method with 92.52% and 91.29% X, Y coordinates and the retransformed X, Y coordinates. accuracies, respectively. Bergado et al. [17] developed deep learning algorithms based on CNN for aerial photo (2) Normalization. Since the aerial orthophotos have inte- classification in high-resolution urban areas. They used data ger digital values and initial weights of the CNN model from optical bands, digital surface models, and ground are randomly selected within 0-1, a z-score normalization truth maps. The results showed that CNN is very effective was applied to pixel values of the orthophotos to avoid in learning discriminative contextual features leading to abnormal gradients. This step is essential as it improves accurate classified maps and outperforming traditional the progress of the activation and the gradient descent classification methods based on the extraction of textural optimization (LeCun et al., 2012). features. Scott et al. [13] applied CNN to produce land cover maps from high-resolution images. Other researchers X/max − μ such as Cheng et al. [12] used CNN as a classification ′ X = , 1 algorithm for scene understanding from aerial imagery. Furthermore, Sherrah [14] and Yao et al. [15] used CNN where max is the maximum pixel value in the image, μ and σ for semantic classification of aerial images. are the mean and standard deviation of X/max, respectively, This research investigates the development of a CNN and X is normalized data. model with regularization techniques such as dropout and batch normalization for classifying aerial orthophotos 3.2. The Proposed Approach into general land cover classes (e.g., road, building, water- body, grassland, barren land, shadow, and dense vegetation). 3.2.1. Overview. An orthophoto is composed of m × n × d The main objective of the research is to run several experi- digital values, where m, n, and d are the image width, length, ments exploring the impacts of CNN architectures and and depth, respectively. The goal of a classification model is Journal of Sensors 3 Perlis Kedah Kelantan Trengganu Perak Pahang Selangor Negeri sembilan Melaka Johor 101°32′20″E 101°32′30″E 101°32′40″E 101°32′50″E 101°33′0″E 101°33′10″E 101°32′20″E 101°32′30″E 101°32′40″E 101°32′50″E 101°33′0″E 101°33′10″E 0 75 150 300 450 600 (Meter) Figure 1: The study area location map. 3°5′40″N 3°5′40″N 3°5′50″N 3°5′40″N 3°5′40″N 3°5′50″N 4 Journal of Sensors 101°32′40″E 101°32′20″E 101°33′0″E 101°32′20″E 101°32′40″E 101°33′0″E Road (7417) Dense vegetation (5538) Water body (7363) Shadow (475) Grassland (7038) Barren land (3681) Building (5109) 0 75 150 300 450 600 (Meter) Figure 2: The ground truth samples over the study area, which were manually selected for seven land cover classes, for example, road, water body, grassland, building, dense vegetation, shadow, and barren land. The number in the brackets indicates the number of pixels in each class. very high-resolution aerial orthophotos. The following sec- to assign a label to each pixel in the image given a set of train- ing examples with their ground truth labels. In general, the tions describe the proposed model and its components common classification methods utilize the spectral informa- including the basics of CNN, the network architecture, and tion (image pixels across different bands) to achieve that goal. the training methodology. In addition, some of other techniques such as object-based The pseudocode of the proposed classification model is presented in Algorithm 1. We developed the CNN model image analysis (OBIA) segment the input image into several homogeneous contiguous groups before classification. This in the current study by running several experiments with method uses additional features like spatial, shape, and tex- different configurations. Then, we designed the ultimate ture to boost the classification performance of the classifier. model with best hyperparameters and architecture based However, both the methods, pixel-based and OBIA have on some statistical accuracy metrics such as overall accu- several challenges like speckle noise in the first method and racy, Kappa index, and per-class accuracies. segmentation optimization in OBIA. Furthermore, both methods require careful feature engineering and band selec- 3.2.2. Basics of CNN. Convolutional neural networks (CNNs) tion to obtain high accuracy of classification. More recently, or ConvNets are a type of artificial neural networks that classification methods using image patches and deep learning simulate the human vision cortical system by using local algorithms have been proposed to overcome the above chal- receptive field and shared weights. It was introduced by lenges. Among the common methods is CNN. As a result, LeCun and his colleagues [18]. Figure 3 shows a typical this study has proposed a classification method that is based CNN with convolutional max pooling operations. CNN is on CNN and spectral-spatial feature learning for classifying suitable for analyzing images, videos, or data in the form of 3°5′40″N 3°5′40″N Journal of Sensors 5 Algorithm 1: CNN for orthophoto classification Input: RGB image (I) captured by the aerial remote sensing system, training/testing samples (D) Output: Land cover classification map with seven classes (O) I, D, O Preprocessing (Section 3.1.2): calibrate I using the available 34 GCPs normalize pixel values using Eq. 1 Classification (CNN) (Section 3.2.2 and Section 3.2.3): for Patch_x_axis: initialize sum = 0 for Patch_y_axis: calculate dot product(Patch, Filter) result_convolution (x, y) = Dot product for Patch_x_axis: for Patch_y_axis: calculate Max (Patch) result_maxpool (x, y) = Dot product update F = max(0, x) result_cnn_model = trained model Prediction: apply the trained model to the whole image and get O Mapping: get the results of prediction reshape the predicted values to the original image shape convert the array to image and write it on the hard disk Algorithm 1: The pseudocode of the proposed CNN developed for land cover mapping using aerial images. Convolution ReLU Pooling Figure 3: Illustration of typical layers of a CNN. n-dimensional arrays that have a spatial component. This location [19]. In addition, pooling layers (or subsampling) unique property makes them suitable for remote sensing are used to merge semantically similar features into one. image classification as well. A typical architecture of CNN The most common method of subsampling computes the consists of a series of layers such as convolution, pooling, maximum of a local patch of units in feature maps. Other fully connected (i.e., dense), and logistic regression/Softmax. pooling operations are averaging max pooling and stochastic However, additional layers like dropout and batch normali- pooling. In general, several convolutional and subsampling zation also can be added to avoid overfitting and improve layers are stacked, followed by dense layers and a Softmax the generalization of these models. The last layer depends or a logistic regression layer to predict the label of each pixel in the image. on the type of the problem, where for binary classification problems, a logistic regression (sigmoid) layer is often used. Instead, for multiclass classification problems, a Softmax 3.2.3. Network Architecture. The architecture of the CNN layer is used. Each layer has its operation and is aimed in model was built with a single convolutional layer followed these models. For example, the convolutional layers are by a max pooling operation, batch normalization, and two aimed at constructing feature maps via convolutional filters dense layer classifiers (Figure 4). This architecture yielded that can learn high-level features that allow taking advantage 3527 total parameters where 96 parameters are not trainable. of the image properties. The output of these layers then The convolutional kernels were kept as 3 × 3, and the pooling passes through a nonlinearity such as a ReLU (rectified linear size in the max pooling layer was kept at 2 × 2. Dropout was unit). Local groups of values in array data are often highly performed in the convolutional layer and the first dense layer correlated, and local statistics of images are invariant to with a drop probability of 0.5 to avoid overfitting. The 6 Journal of Sensors Dropout Softmax Convolution ReLU Pooling Batch Dense Input image normalization Flatten Figure 4: The architecture of the proposed CNN for aerial orthophoto classification. Table 1: The summary of the CNN model layers. CNN, θ is the parameters of Softmax layer, N is the number i i of samples, k is the number of land cover classes, y = y , Layer (type) Output shape Number of parameters i i y , … , y is the prediction vector geo by the Softmax classi- 2 k Input (None, 3, 7, 7) 0 fier (3), and y represents the possibility of the ith sample 2D convolution (None, 1, 5, 32) 2048 label being t and is computed by (3). Max pooling (None, 1, 2, 16) 0 Batch normalization (None, 1, 2, 16) 64 exp θ c Dropout (None, 1, 2, 16) 0 y = 3 〠 exp θ c Flatten (None, 32) 0 j=1 t Dense (None, 32) 1056 During back propagation, (4) are adapted to update W Batch normalization (None, 32) 128 and b in every layer, where λ is the momentum which help Dropout (None, 32) 0 accelerate SGD by adding a fraction of the update value of Dense (Softmax) (None, 7) 231 the past time step to the current update value, α is the learn- ing rate, ∇W and ∇b are the gradients of J · with respect to W and b, respectively, and t just stands for the number of minibatch of stochastic gradient descent (SGD) was set to 32 epoch during SGD: images. Under the framework of Keras with Tensorflow back- end, the whole process was run on a CPU Core i7 2.6 GHz and W = W − λV − α∇W, t+1 t t memory ram (RAM) of 16 GB. In the experiments, 60% of the b = b − λU − α∇b total samples were randomly chosen for training, and the rest t+1 t t were chosen for testing, and overall accuracy (OA), average 3.2.5. Evaluation. This study uses several statistical accuracy accuracy (AA), Kappa coefficient (κ), and per-class accuracy measures to evaluate different models and compare them (PA) are used to evaluate the performance of the CNN classi- under various experimental configurations. These metrics fication method (Congalton and Green, 2008). The summary are overall accuracy (OA), average accuracy (AA), per-class of the model’s layers is shown in Table 1. accuracy (PA), and Kappa index (κ). They are calculated 3.2.4. Training the Model. The CNN model was trained with using the following equations [20]: backpropagation algorithm and stochastic gradient descent 〠D ii (SGD). It uses the minibatch’s backpropagation error to OA = , approximate the error of all the training samples, which accelerates the cycle of the weight update with smaller back 〠 PA AA = , propagation error to speed up the convergence of the whole model. The optimization was run to reduce the loss function ij (J) (i.e., categorical cross entropy) of CNN expressed as the PA = , following: i m m N〠 D − 〠 R · C ij i j i,j=1 i,j=1 N k k = , i i 2 JX , W, b, θ = − 〠〠 1 y = t · y , 2 N − 〠 R · C i j t i,j=1 i=1 j=1 where ∑D is the total number of correctly classified pixels, ii where X is normalized features, W and b are parameters of N is total number of pixels in the error matrix, m is the Journal of Sensors 7 Model accuracy Model loss 1.00 1.2 0.95 1.0 0.90 0.85 0.8 0.80 0.6 0.75 0.70 0.4 0.65 0.2 0.60 0.55 0.0 0 20 40 60 80 100 0 20 40 60 80 100 Epoch Epoch Training Training Validation Validation (a) (b) Figure 5: Performance of the CNN model with optimum parameters set, (a) model accuracy and (b) model loss for 93 epochs (early stopping). Table 2: PA of the CNN model. classify almost all the classes with relatively high accuracy. The minimum accuracy was 0.894 for the shadow class. Class PA While examining the confusion matrix (Table 3), the Road 0.971 results indicate that several (~11) samples of this class Waterbody 0.944 were misclassified as dense vegetation affecting its PA. Grassland 0.972 The confusion matrix also shows that there were several Building 0.995 samples of water body class misclassified as grassland. Dense vegetation 0.999 4.1.2. CNN Model with Other Configurations. The CNN Shadow 0.894 model was also trained without dropout and batch nor- Barren land 0.980 malization to see their impacts on the accuracy of the classi- fication map. Table 4 summarizes the results of comparing CNN models with different configurations (i.e., CNN + drop- number of classes, D is the number of correctly classified ij out + batch normalization, CNN + dropout, CNN + batch pixels in row i (in the diagonal cell), R is the total number normalization, and CNN). The results suggest that the of pixels in row i, and C is the total number of pixels in j use of dropout and batch normalization could improve column j. the accuracy (OA, AA, and κ) of the classification by almost 4%. The use of batch normalization slightly per- formed better (OA = 0.964, AA = 0.956, κ = 0.961) than just 4. Experimental Results using dropout (OA = 0.958, AA = 0.956, κ = 0.954). Never- theless, the use of either dropout or batch normalization 4.1. Performance of the Proposed Model could improve the accuracy of the classification compared 4.1.1. CNN with Dropout and Batch Normalization. Figure 5 to not using any of these techniques with the CNN model. shows the accuracy performance of the CNN model with The CNN model without these techniques achieved the dropout and batch normalization for 93 epochs on both following accuracies: OA = 0.932, AA = 0.922, κ = 0.922 training and validation datasets. The increment in model indicating the importance of such regularization methods accuracy and reduction in model loss over time indicates for aerial orthophoto classification. The classified maps that the model has learned useful features to classify the produced by these methods are shown in Figure 6. Fur- image pixels into the different class labels. The fluctuations thermore, the performance plot (Figure 7) of the CNN in the accuracy from one epoch to another are because of model without dropout and batch normalization shows using dropout that yielded a slightly different model at that this model overfits the training data and performs worse when applied to new data. Overall, the experimental each epoch. The OA, AA, κ of this model on validation dataset was 0.973, 0.965, 0.967, respectively. In addition, results on both training and validation data sets infer that Table 2 shows the per-class accuracy (PA) achieved by the proposed CNN architecture is a robust and efficient the model. The results suggest that the CNN model could model, while the use of dropout and batch normalization Accuracy Loss 8 Journal of Sensors Table 3: The confusion matrix calculated for the CNN model. Road Waterbody Grassland Building Dense vegetation Shadow Barren land Road 1474 0 0 23 0 0 21 Water body 0 1463 85 0 0 1 0 Grassland 0 10 1323 027 0 0 Building 4 0 0 991 00 0 Dense vegetation 0 0 0 0 1070 10 Shadow 0 0 0 0 11 93 0 Barren land 6 0 0 8 0 0 716 regarding AA. Furthermore, performances of CNN with Table 4: Performance of CNN model with different configurations. different optimizers have been investigated, and the results indicated that “Adam” could be effective in training com- Model OA AA κ pared to other optimizers. The highest OA (0.975) and κ CNN + dropout + batch normalization 0.973 0.965 0.967 (0.970) were achieved by the CNN model that was trained CNN + dropout 0.958 0.956 0.954 with “Adam.” However, when the optimizer “Nadam” was CNN + batch normalization 0.964 0.956 0.961 used to train CNN, the model could achieve the highest CNN 0.932 0.922 0.922 AA (0.974). The worst performance of CNN (OA = 0.945, AA = 0.949, and κ = 0.934) was found to be when the model was trained with SGD. Moreover, the efficiency of techniques as regularization methods is essential to obtain CNN was compared with different batch sizes such as 4, high accuracy of classification for the entire area rather 8, 16, 32, and 64. The batch size of 32 was found the best than just predicting the labels of the training samples. considering OA (0.975) and κ (0.970), while the batch size of 64 achieved the highest AA (0.975). 4.2. Sensitivity Analysis. The performance of CNN while Another important parameter in the proposed CNN is the patch size, which is the local neighborhood area that classifying orthophotos is highly dependent on its architec- ture and hyperparameters. Thus, the sensitivity analysis forms with the size (n × n). The advantage of using could serve as an essential step in finding a good set of patch-based learning for orthophoto classification is parameters and architecture configurations in addition to sourced from the benefits of spectral and spatial informa- an understanding of the model behavior. Figure 8 shows tion of the data that can improve the accuracy compared to just using the individual pixels (only spectral informa- the impact of different parameters (e.g., number of convolutional filters, activation function, drop probability, tion). To understand this parameter and find its subopti- optimizer, batch size, and patch size) on the validation mal value, several experiments were conducted with accuracy of CNN. different patch sizes (n = 3, 5, 7, 9, 11, 13 ). The statistical For convolutional filters, the sensitivity analysis shows analysis in terms of model accuracy indicates that using that larger number of filters can lead to an increase in per- larger n yields higher accuracy (Figure 8). However, when formance. However, use of larger number of filters can analyzing the classification map visually, the use of larger increase training time and overfit the training data if the n reduces the spatial quality of the features in the classifi- model is not regularized properly. Thus, this parameter cation map (Figure 9). As a result, we considered n =7 as was set to 32 as an optimal setting and not exploring a an effective value for this parameter as it achieved rela- larger number of filters. With this configuration, the tively high accuracy measured by OA, AA, and κ as well model could achieve the following accuracies: OA = 0.956, as high spatial quality features. AA = 0.945, and κ = 0.947. In addition, this analysis shows that the activation function “ReLU” outperformed the 4.3. Training Time Analysis. The computing performance of other two functions (“Sigmoid” and “ELU”). By using this the CNN model was dependent on the use of dropout and activation, the CNN model could achieve an OA of 0.956 batch normalization layers in the network architecture in higher than the second best activation “Sigmoid” by addition to other hyperparameters such as a number of ~4.4%. ReLU also facilitates faster training and reduced convolutional filters and image patch size. Table 5 shows likelihood of vanishing gradient. The experiments on drop the training time of the CNN model with different config- probability showed that different parametric values can urations. When early stopping was applied, the training of improve the performance of CNN depending on the accu- CNN with dropout and batch normalization took about racy metric. For example, results showed that the use of 124 seconds on a CPU. Removing the batch normalization drop probability as 0.2 could optimize the model for OA from the architecture yielded a training time of 150 sec- and κ, where the model achieved an OA and κ of 0.975, onds, whereas CNN with dropout took 75 seconds to be 0.970, respectively. However, drop probability of 0.3 could trained. The CNN model without the use of dropout and perform better than the value of 0.2 for this parameter batch normalization took the shortest time (58.4 seconds) Journal of Sensors 9 Road Road Dense vegetation Dense vegetation Water body Shadow Water body Shadow Barren land Barren land Grassland Grassland Building Building (a) (b) Road Dense vegetation Road Dense vegetation Water body Shadow Water body Shadow Grassland Barren land Barren land Grassland Building Building (c) (d) Figure 6: Classification maps produced by CNN models, (a) CNN + dropout + batch normalization, (b) CNN + dropout, (c) CNN + batch normalization, and (d) CNN. to be trained. On the other hand, when the model was Model loss trained with 200 epochs without early stopping, the model 1.2 (CNN + dropout + batch normalization) took about 230 seconds longer than that with early stopping by 106 sec- 1.0 onds. In addition, the other models (CNN + dropout, 0.8 CNN + batch normalization, and CNN) were also required a longer time to train as it was expected due to more 0.6 number of epochs run. Overall, the computing perfor- mance of the proposed model is efficient for the investi- 0.4 gated data. However, for larger datasets, the training of such models will require longer time, and as a result, 0.2 graphical processing units will be essential. 0.0 5. Conclusion 0 10 20 30 40 50 Epoch In this paper, a classification model based on CNN and Training spectral-spatial feature learning has been proposed for aerial Validation photographs. With the utilization of advanced regularization techniques such as dropout and batch normalization, the Figure 7: The loss of the CNN model without dropout and batch proposed model could balance generalization ability and normalization. training efficiency. Use of such methods to improve the CNN model along with other techniques like preprocessing Loss 10 Journal of Sensors 0.8 0.6 Overall Average Kappa 0.8 accuracy accuracy Overall Average Kappa 0.874 0.874 0.848 accuracy accuracy 8 0.918 0.931 0.902 Sigmoid 0.912 0.903 0.893 0.945 ReLU 0.956 0.945 0.947 16 0.942 0.931 0.945 32 0.956 0.947 ELU 0.897 0.911 0.876 Number of convolutional filters Activation 0.8 Overall Average Kappa accuracy accuracy 0.8 Overall Average Adam 0.975 0.961 0.97 Kappa accuracy accuracy Nadam 0.971 0.974 0.965 0.2 0.975 0.961 0.97 SGD 0.945 0.949 0.934 0.3 0.954 0.964 0.945 Adamax 0.97 0.964 0.963 0.4 0.954 0.962 0.959 Adadelta 0.964 0.96 0.956 0.5 0.956 0.963 0.951 Adagrad 0.951 0.951 0.941 0.946 0.6 0.955 0.953 RMSProp 0.972 0.965 0.966 Drop probability Optimizer 1 1 0.8 Overall Average 0.8 Kappa accuracy accuracy Overall Average Kappa 3 × 3 accuracy accuracy 0.93 0.927 0.918 5 × 5 4 0.951 0.946 0.941 0.967 0.972 0.963 8 7 × 7 0.964 0.966 0.957 0.961 0.968 0.955 9 × 9 16 0.964 0.955 0.956 0.977 0.98 0.974 32 0.975 0.961 0.97 11 × 11 0.977 0.977 0.975 64 0.972 0.975 0.967 13 × 13 0.985 0.986 0.984 Batch size Patch size Figure 8: The influence of hyperparameters, the number of convolutional filters, activation function, drop probability, optimizer, batch size, and patch size. (geometric calibration and feature normalization) and sensi- outperforming the traditional CNN model by ~4% in all tivity analysis could make these models robust for classifying the accuracy indicators. The short training time (124 sec- the given dataset. The CNN model acts as a feature extractor, onds) confirmed the robustness of the proposed model for and a classifier could be trained end-to-end given training small and medium scale remote sensing datasets. The future samples. The network architecture can effectively handle work should focus on scaling this architecture for large the inter- and intraclass complexity inside the scene. The best remote sensing datasets and other data sources such as satel- model achieved OA = 0.973, AA = 0.965, and κ = 0.967 lite images and laser scanning point clouds. Validation accuracy Validation accuracy Validation accuracy Validation Validation Validation accuracy accuracy accuracy Journal of Sensors 11 3 × 3 5 × 5 7 × 7 9 × 9 11 × 11 13 × 13 Figure 9: Effects of patch size on the quality of classified maps. Table 5: The training time in seconds of CNN with different [3] S. Basu, S. Ganguly, S. Mukhopadhyay, R. DiBiano, M. Karki, configurations for 200 epochs. and R. Nemani, “DeepSat: a learning framework for satellite imagery,” in Proceedings of the 23rd SIGSPATIAL Interna- Time tional Conference on Advances in Geographic Information Time (seconds)—with Model (seconds)—full Systems, p. 37, New York, NY, USA, November 2015. early stopping training [4] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning- CNN + dropout based classification of hyperspectral data,” IEEE Journal of 124 230 + batch normalization Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 6, pp. 2094–2107, 2014. CNN + dropout 150 168 [5] W. Li, G. Wu, F. Zhang, and Q. Du, “Hyperspectral image CNN + batch 75 219 classification using deep pixel-pair features,” IEEE Transac- normalization tions on Geoscience and Remote Sensing, vol. 55, no. 2, CNN 58.4 158 pp. 844–853, 2017. [6] W. Zhao and S. Du, “Spectral–spatial feature extraction for hyperspectral image classification: a dimension reduction Data Availability and deep learning approach,” IEEE Transactions on Geosci- ence and Remote Sensing, vol. 54, no. 8, pp. 4544–4554, These data were used from a research project lead by Profes- sor Biswajeet Pradhan. Very high resolution aerial photos [7] Y. T. Hsieh, C. T. Chen, and J. C. Chen, “Applying object- were used in this research. The data can be made available based image analysis and knowledge-based classification to upon request. ADS-40 digital aerial photographs to facilitate complex forest land cover classification,” Journal of Applied Remote Sensing, vol. 11, no. 1, article 015001, 2017. Conflicts of Interest [8] M. F. A. Vogels, S. M. De Jong, G. Sterk, and E. A. Addink, “Agricultural cropland mapping using black-and-white aerial The authors declare that they have no conflicts of interest. photography, object-based image analysis, and random for- ests,” International Journal of Applied Earth Observation and Geoinformation, vol. 54, pp. 114–123, 2017. References [9] X. Meng, N. Shang, X. Zhang et al., “Photogrammetric UAV [1] Ö. Akar, “The Rotation Forest algorithm and object-based mapping of terrain under dense coastal vegetation: an object- classification method for land use mapping through UAV oriented classification ensemble algorithm for classification images,” Geocarto International, vol. 33, no. 5, pp. 538–553, and terrain correction,” Remote Sensing, vol. 9, no. 11, p. 1187, 2017. 2017. [2] Q. Wu, R. Zhong, W. Zhao, H. Fu, and K. Song, “A [10] A. Juel, G. B. Groom, J. C. Svenning, and R. Ejrnaes, “Spatial comparison of pixel-based decision tree and object-based application of random forest models for fine-scale coastal veg- Support Vector Machine methods for land-cover etation classification using object based analysis of aerial classification based on aerial images and airborne lidar data,” orthophoto and DEM data,” International Journal of Applied International Journal of Remote Sensing, vol. 38, no. 23, Earth Observation and Geoinformation, vol. 42, pp. 106–114, pp. 7176–7195, 2017. 2015. 12 Journal of Sensors [11] L. Albert, F. Rottensteiner, and C. Heipke, “A higher order conditional random field model for simultaneous classification of land cover and land use,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 130, pp. 63–80, 2017. [12] G. Cheng, C. Ma, P. Zhou, X. Yao, and J. Han, “Scene clas- sification of high resolution remote sensing images using convolutional neural networks,” in 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 767–770, Beijing, China, July 2016. [13] G. J. Scott, M. R. England, W. A. Starms, R. A. Marcum, and C. H. Davis, “Training deep convolutional neural networks for land-cover classification of high-resolution imagery,” IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 4, pp. 549– 553, 2017. [14] J. Sherrah, “Fully convolutional networks for dense semantic labelling of high-resolution aerial imagery,” 2016, http://arxi- v.org/abs/1606.02585. [15] W. Yao, P. Poleswki, and P. Krzystek, “Classification of urban aerial data based on pixel labelling with deep convolutional neural networks and logistic regression,” ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XLI-B7, pp. 405–410, 2016. [16] X. Sun, X. Lin, S. Shen, and Z. Hu, “High-resolution remote sensing data classification over urban areas using random for- est ensemble and fully connected conditional random field,” ISPRS International Journal of Geo-Information, vol. 6, no. 8, p. 245, 2017. [17] J. R. Bergado, C. Persello, and C. Gevaert, “A deep learning approach to the classification of sub-decimetre resolution aerial images,” in 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 1516–1519, Beijing, China, July 2016. [18] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient- based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [19] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. [20] A. Bogoliubova and P. Tymków, “Accuracy assessment of automatic image processing for land cover classification of St. Petersburg protected area,” Acta Scientiarum Polonorum. Geodesia et Descriptio Terrarum, vol. 13, no. 1-2, 2014. International Journal of Advances in Rotating Machinery Multimedia Journal of The Scientific Journal of Engineering World Journal Sensors Hindawi Hindawi Publishing Corporation Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 http://www www.hindawi.com .hindawi.com V Volume 2018 olume 2013 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Journal of Control Science and Engineering Advances in Civil Engineering Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Submit your manuscripts at www.hindawi.com Journal of Journal of Electrical and Computer Robotics Engineering Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 VLSI Design Advances in OptoElectronics International Journal of Modelling & Aerospace International Journal of Simulation Navigation and in Engineering Engineering Observation Hindawi Hindawi Hindawi Hindawi Volume 2018 Volume 2018 Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com www.hindawi.com www.hindawi.com Volume 2018 International Journal of Active and Passive International Journal of Antennas and Advances in Chemical Engineering Propagation Electronic Components Shock and Vibration Acoustics and Vibration Hindawi Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

Journal

Journal of SensorsHindawi Publishing Corporation

Published: Jun 26, 2018

There are no references for this article.