Access the full text.
Sign up today, get DeepDyve free for 14 days.
L. Albert, F. Rottensteiner, C. Heipke (2017)
A higher order conditional random field model for simultaneous classification of land cover and land useIsprs Journal of Photogrammetry and Remote Sensing, 130
Y. Hsieh, Chaur‐Tzuhn Chen, J. Chen (2017)
Applying object-based image analysis and knowledge-based classification to ADS-40 digital aerial photographs to facilitate complex forest land cover classificationJournal of Applied Remote Sensing, 11
Yann LeCun, Yoshua Bengio, Geoffrey Hinton (2015)
Deep LearningNature, 521
W. Yao, P. Poleswki, P. Krzystek (2016)
CLASSIFICATION OF URBAN AERIAL DATA BASED ON PIXEL LABELLING WITH DEEP CONVOLUTIONAL NEURAL NETWORKS AND LOGISTIC REGRESSIONISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Gong Cheng, Chengcheng Ma, Peicheng Zhou, Xiwen Yao, Junwei Han (2016)
Scene classification of high resolution remote sensing images using convolutional neural networks2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)
陳朝圳, 陳建璋 (2017)
Applying object-based image analysis and knowledge-based classification to ADS-40 digital aerial photographs to facilitate complex forest land cover classification
Yann LeCun, L. Bottou, Yoshua Bengio, P. Haffner (1998)
Gradient-based learning applied to document recognitionProc. IEEE, 86
G. Scott, Matthew England, William Starms, Richard Marcum, C. Davis (2017)
Training Deep Convolutional Neural Networks for Land–Cover Classification of High-Resolution ImageryIEEE Geoscience and Remote Sensing Letters, 14
Xiaofeng Sun, Xiangguo Lin, Shuhan Shen, Zhanyi Hu (2017)
High-Resolution Remote Sensing Data Classification over Urban Areas Using Random Forest Ensemble and Fully Connected Conditional Random FieldISPRS Int. J. Geo Inf., 6
Qiong Wu, Ruofei Zhong, Wenji Zhao, Han Fu, Kai Song (2017)
A comparison of pixel-based decision tree and object-based Support Vector Machine methods for land-cover classification based on aerial images and airborne lidar dataInternational Journal of Remote Sensing, 38
J. Bergado, C. Persello, C. Gevaert (2016)
A deep learning approach to the classification of sub-decimetre resolution aerial images2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS)
Marjolein Vogels, S. Jong, G. Sterk, E. Addink (2017)
Agricultural cropland mapping using black-and-white aerial photography, Object-Based Image Analysis and Random ForestsInt. J. Appl. Earth Obs. Geoinformation, 54
Yushi Chen, Zhouhan Lin, Xing Zhao, G. Wang, Yanfeng Gu (2014)
Deep Learning-Based Classification of Hyperspectral DataIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 7
Wei Li, Guodong Wu, Fan Zhang, Q. Du (2017)
Hyperspectral Image Classification Using Deep Pixel-Pair FeaturesIEEE Transactions on Geoscience and Remote Sensing, 55
Anders Juel, G. Groom, Jens‐Christian Svenning, R. Ejrnæs (2015)
Spatial application of Random Forest models for fine-scale coastal vegetation classification using object based analysis of aerial orthophoto and DEM dataInt. J. Appl. Earth Obs. Geoinformation, 42
Xuelian Meng, Nan Shang, Xukai Zhang, Chunyan Li, K. Zhao, Xiaomin Qiu, E. Weeks (2017)
Photogrammetric UAV Mapping of Terrain under Dense Coastal Vegetation: An Object-Oriented Classification Ensemble Algorithm for Classification and Terrain CorrectionRemote. Sens., 9
A. Bogoliubova, P. Tymków (2014)
Accuracy assessment of automatic image processing for land cover classification of St. Petersburg protected area, 13
Wenzhi Zhao, S. Du (2016)
Spectral–Spatial Feature Extraction for Hyperspectral Image Classification: A Dimension Reduction and Deep Learning ApproachIEEE Transactions on Geoscience and Remote Sensing, 54
Saikat Basu, S. Ganguly, S. Mukhopadhyay, Robert DiBiano, Manohar Karki, R. Nemani (2015)
DeepSat: a learning framework for satellite imageryProceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems
Neil Genzlinger (2006)
A. and QNew York Times Book Review
W. Marsden (2012)
I and J
and P
J. Sherrah (2016)
Fully Convolutional Networks for Dense Semantic Labelling of High-Resolution Aerial ImageryArXiv, abs/1606.02585
Ö. Akar (2018)
The Rotation Forest algorithm and object-based classification method for land use mapping through UAV imagesGeocarto International, 33
Hindawi Journal of Sensors Volume 2018, Article ID 7195432, 12 pages https://doi.org/10.1155/2018/7195432 Research Article Classification of Very High Resolution Aerial Photos Using Spectral-Spatial Convolutional Neural Networks Maher Ibrahim Sameen, Biswajeet Pradhan , and Omar Saud Aziz School of Systems, Management and Leadership, Faculty of Engineering and Information Technology, University of Technology Sydney, Building 11, Level 06, 81 Broadway, P.O. Box 123, Ultimo, NSW 2007, Australia Correspondence should be addressed to Biswajeet Pradhan; biswajeet24@gmail.com Received 3 March 2018; Revised 17 April 2018; Accepted 6 May 2018; Published 26 June 2018 Academic Editor: Paolo Bruschi Copyright © 2018 Maher Ibrahim Sameen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Classiﬁcation of aerial photographs relying purely on spectral content is a challenging topic in remote sensing. A convolutional neural network (CNN) was developed to classify aerial photographs into seven land cover classes such as building, grassland, dense vegetation, waterbody, barren land, road, and shadow. The classiﬁer utilized spectral and spatial contents of the data to maximize the accuracy of the classiﬁcation process. CNN was trained from scratch with manually created ground truth samples. The architecture of the network comprised of a single convolution layer of 32 ﬁlters and a kernel size of 3 × 3, pooling size of 2 × 2, batch normalization, dropout, and a dense layer with Softmax activation. The design of the architecture and its hyperparameters were selected via sensitivity analysis and validation accuracy. The results showed that the proposed model could be eﬀective for classifying the aerial photographs. The overall accuracy and Kappa coeﬃcient of the best model were 0.973 and 0.967, respectively. In addition, the sensitivity analysis suggested that the use of dropout and batch normalization technique in CNN is essential to improve the generalization performance of the model. The CNN model without the techniques above achieved the worse performance, with an overall accuracy and Kappa of 0.932 and 0.922, respectively. This research shows that CNN-based models are robust for land cover classiﬁcation using aerial photographs. However, the architecture and hyperparameters of these models should be carefully selected and optimized. 1. Introduction of contextual information from data via feature pooling from a local spatial neighborhood [3]. Classifying remote sensing data (especially orthophotos of There are several methods and algorithms that have been adopted by many researchers to eﬃciently classify a very three bands—red, green, blue (RGB)) with traditional methods is a challenge even though some methods in litera- high-resolution aerial photo and produce accurate land cover ture have produced excellent results [1, 2]. The main reason maps. Methods such as object-based image analysis (or behind is that remote sensing datasets have high intra- and OBIA) was mostly investigated because of its advantage in interclass variability and the amount of labeled data is much very high-resolution image processing via spectral and spa- smaller as compared to the total size of the dataset [3]. On the tial features. In a recent paper, Hsieh et al. [7] applied aerial other hand, the recent advances in deep learning methods photo classiﬁcation by combining OBIA with decision tree like convolutional neural networks (CNNs) have shown using texture, shape, and spectral feature. Their results promising results in remote sensing image classiﬁcation achieved an accuracy of 78.20% and a Kappa coeﬃcient of especially hyperspectral image classiﬁcation [4–6]. The 0.7597. Vogels et al. [8] combined OBIA with random forest advantages of deep learning methods include learning high- classiﬁcation with texture, slope, shape, neighbor, and spec- order features from the data that are often useful than the tral information to produce classiﬁcation maps for agricul- raw pixels for classifying the image into some predeﬁned tural areas. They have tested their algorithm on two labels. Other advantages of these methods are spatial learning datasets, and the results showed the employed methodology 2 Journal of Sensors hyperparameters on the accuracy of land cover classiﬁca- to be eﬀective with accuracies of 90% and 96% for the two study areas, respectively. On the other hand, a novel model tion using aerial photos. The aim is to understand the was presented by Meng et al. [9], where they applied OBIA behaviours of the CNN model concerning its architecture to improve vegetation classiﬁcation based on aerial photos design and hyperparameters to produce models with high and global positioning systems. Results illustrated a signiﬁ- generalization capacity. cant improvement in classiﬁcation accuracy that increased from 83.98% to 96.12% in overall accuracy and from 0.7806 3. Methodology to 0.947 in the Kappa value. Furthermore, Juel et al. [10] This section presents the dataset, preprocessing, and the showed that random forest with the use of a digital elevation methodology of the proposed CNN model including the net- model could achieve relatively high performance for vegeta- work architecture and training procedure. tion mapping. In a most recent paper, Wu et al. [2] developed a model based on a comparison between pixel-based decision 3.1. Dataset and Preprocessing tree and object-based SVM to classify aerial photos. The object-based support vector machine (SVM) had higher 3.1.1. Dataset. To implement the current research, a pilot accuracy than that of the pixel-based decision tree. Albert area was identiﬁed based on the diversity of the land et al. [11] developed classiﬁers based on conditional random cover of the area. The study area is located in Selangor, ﬁelds and pixel-based analysis to classify aerial photos. Their Malaysia (Figure 1). results showed that such techniques are beneﬁcial for land 3.1.2. Preprocessing cover classes covering large, homogeneous areas. (1) Geometric Calibration. Since the orthophoto was cap- 2. Related Works tured by an airborne laser scanning (LiDAR) system, it was essential to calibrate it geometrically to correct the geometric The success of CNN in the ﬁelds like computer vision, errors. In this step, the data was corrected based on ground language modeling, and speech recognition has motivated control points (GCPs) collected from the ﬁeld (Figure 2). the remote sensing scientists to apply it in image classiﬁca- There were 34 GCPs identiﬁed from clearly identiﬁable tion. There are several works that have been done on CNN points (i.e., road intersections, corners, and power lines). for remote sensing image classiﬁcation [12–15]. This section The geometric correction was done in ArcGIS 10.5 software. brieﬂy explains some of these works highlighting their The steps of geometric correction included identiﬁcation ﬁndings and their limitations. of transformation points in the orthophoto, application Sun et al. [16] proposed an automated model for feature of the least square transformation, and calculation of the extraction and classiﬁcation with classiﬁcation reﬁnement accuracy of the process. The selected points were uniformly by combining random forest and CNN. Their combined distributed in the area. After that, the least square method model could perform well (86.9%) and obtained higher (Kardoulas et al., 1996) was applied to estimate the coeﬃ- accuracy than the single models. Akar [1] developed a cients, which are essential for the geometric transformation model based on rotation forest and OBIA to classify aerial process. After the least square solution, the polynomial equa- photos. Results were compared to gentle AdaBoost, and tions were used to solve for X, Y coordinates of GCPs and to their experiments suggested that their method performed determine the residuals and RMS errors between the source better than the other method with 92.52% and 91.29% X, Y coordinates and the retransformed X, Y coordinates. accuracies, respectively. Bergado et al. [17] developed deep learning algorithms based on CNN for aerial photo (2) Normalization. Since the aerial orthophotos have inte- classiﬁcation in high-resolution urban areas. They used data ger digital values and initial weights of the CNN model from optical bands, digital surface models, and ground are randomly selected within 0-1, a z-score normalization truth maps. The results showed that CNN is very eﬀective was applied to pixel values of the orthophotos to avoid in learning discriminative contextual features leading to abnormal gradients. This step is essential as it improves accurate classiﬁed maps and outperforming traditional the progress of the activation and the gradient descent classiﬁcation methods based on the extraction of textural optimization (LeCun et al., 2012). features. Scott et al. [13] applied CNN to produce land cover maps from high-resolution images. Other researchers X/max − μ such as Cheng et al. [12] used CNN as a classiﬁcation ′ X = , 1 algorithm for scene understanding from aerial imagery. Furthermore, Sherrah [14] and Yao et al. [15] used CNN where max is the maximum pixel value in the image, μ and σ for semantic classiﬁcation of aerial images. are the mean and standard deviation of X/max, respectively, This research investigates the development of a CNN and X is normalized data. model with regularization techniques such as dropout and batch normalization for classifying aerial orthophotos 3.2. The Proposed Approach into general land cover classes (e.g., road, building, water- body, grassland, barren land, shadow, and dense vegetation). 3.2.1. Overview. An orthophoto is composed of m × n × d The main objective of the research is to run several experi- digital values, where m, n, and d are the image width, length, ments exploring the impacts of CNN architectures and and depth, respectively. The goal of a classiﬁcation model is Journal of Sensors 3 Perlis Kedah Kelantan Trengganu Perak Pahang Selangor Negeri sembilan Melaka Johor 101°32′20″E 101°32′30″E 101°32′40″E 101°32′50″E 101°33′0″E 101°33′10″E 101°32′20″E 101°32′30″E 101°32′40″E 101°32′50″E 101°33′0″E 101°33′10″E 0 75 150 300 450 600 (Meter) Figure 1: The study area location map. 3°5′40″N 3°5′40″N 3°5′50″N 3°5′40″N 3°5′40″N 3°5′50″N 4 Journal of Sensors 101°32′40″E 101°32′20″E 101°33′0″E 101°32′20″E 101°32′40″E 101°33′0″E Road (7417) Dense vegetation (5538) Water body (7363) Shadow (475) Grassland (7038) Barren land (3681) Building (5109) 0 75 150 300 450 600 (Meter) Figure 2: The ground truth samples over the study area, which were manually selected for seven land cover classes, for example, road, water body, grassland, building, dense vegetation, shadow, and barren land. The number in the brackets indicates the number of pixels in each class. very high-resolution aerial orthophotos. The following sec- to assign a label to each pixel in the image given a set of train- ing examples with their ground truth labels. In general, the tions describe the proposed model and its components common classiﬁcation methods utilize the spectral informa- including the basics of CNN, the network architecture, and tion (image pixels across diﬀerent bands) to achieve that goal. the training methodology. In addition, some of other techniques such as object-based The pseudocode of the proposed classiﬁcation model is presented in Algorithm 1. We developed the CNN model image analysis (OBIA) segment the input image into several homogeneous contiguous groups before classiﬁcation. This in the current study by running several experiments with method uses additional features like spatial, shape, and tex- diﬀerent conﬁgurations. Then, we designed the ultimate ture to boost the classiﬁcation performance of the classiﬁer. model with best hyperparameters and architecture based However, both the methods, pixel-based and OBIA have on some statistical accuracy metrics such as overall accu- several challenges like speckle noise in the ﬁrst method and racy, Kappa index, and per-class accuracies. segmentation optimization in OBIA. Furthermore, both methods require careful feature engineering and band selec- 3.2.2. Basics of CNN. Convolutional neural networks (CNNs) tion to obtain high accuracy of classiﬁcation. More recently, or ConvNets are a type of artiﬁcial neural networks that classiﬁcation methods using image patches and deep learning simulate the human vision cortical system by using local algorithms have been proposed to overcome the above chal- receptive ﬁeld and shared weights. It was introduced by lenges. Among the common methods is CNN. As a result, LeCun and his colleagues [18]. Figure 3 shows a typical this study has proposed a classiﬁcation method that is based CNN with convolutional max pooling operations. CNN is on CNN and spectral-spatial feature learning for classifying suitable for analyzing images, videos, or data in the form of 3°5′40″N 3°5′40″N Journal of Sensors 5 Algorithm 1: CNN for orthophoto classiﬁcation Input: RGB image (I) captured by the aerial remote sensing system, training/testing samples (D) Output: Land cover classiﬁcation map with seven classes (O) I, D, O Preprocessing (Section 3.1.2): calibrate I using the available 34 GCPs normalize pixel values using Eq. 1 Classiﬁcation (CNN) (Section 3.2.2 and Section 3.2.3): for Patch_x_axis: initialize sum = 0 for Patch_y_axis: calculate dot product(Patch, Filter) result_convolution (x, y) = Dot product for Patch_x_axis: for Patch_y_axis: calculate Max (Patch) result_maxpool (x, y) = Dot product update F = max(0, x) result_cnn_model = trained model Prediction: apply the trained model to the whole image and get O Mapping: get the results of prediction reshape the predicted values to the original image shape convert the array to image and write it on the hard disk Algorithm 1: The pseudocode of the proposed CNN developed for land cover mapping using aerial images. Convolution ReLU Pooling Figure 3: Illustration of typical layers of a CNN. n-dimensional arrays that have a spatial component. This location [19]. In addition, pooling layers (or subsampling) unique property makes them suitable for remote sensing are used to merge semantically similar features into one. image classiﬁcation as well. A typical architecture of CNN The most common method of subsampling computes the consists of a series of layers such as convolution, pooling, maximum of a local patch of units in feature maps. Other fully connected (i.e., dense), and logistic regression/Softmax. pooling operations are averaging max pooling and stochastic However, additional layers like dropout and batch normali- pooling. In general, several convolutional and subsampling zation also can be added to avoid overﬁtting and improve layers are stacked, followed by dense layers and a Softmax the generalization of these models. The last layer depends or a logistic regression layer to predict the label of each pixel in the image. on the type of the problem, where for binary classiﬁcation problems, a logistic regression (sigmoid) layer is often used. Instead, for multiclass classiﬁcation problems, a Softmax 3.2.3. Network Architecture. The architecture of the CNN layer is used. Each layer has its operation and is aimed in model was built with a single convolutional layer followed these models. For example, the convolutional layers are by a max pooling operation, batch normalization, and two aimed at constructing feature maps via convolutional ﬁlters dense layer classiﬁers (Figure 4). This architecture yielded that can learn high-level features that allow taking advantage 3527 total parameters where 96 parameters are not trainable. of the image properties. The output of these layers then The convolutional kernels were kept as 3 × 3, and the pooling passes through a nonlinearity such as a ReLU (rectiﬁed linear size in the max pooling layer was kept at 2 × 2. Dropout was unit). Local groups of values in array data are often highly performed in the convolutional layer and the ﬁrst dense layer correlated, and local statistics of images are invariant to with a drop probability of 0.5 to avoid overﬁtting. The 6 Journal of Sensors Dropout Softmax Convolution ReLU Pooling Batch Dense Input image normalization Flatten Figure 4: The architecture of the proposed CNN for aerial orthophoto classiﬁcation. Table 1: The summary of the CNN model layers. CNN, θ is the parameters of Softmax layer, N is the number i i of samples, k is the number of land cover classes, y = y , Layer (type) Output shape Number of parameters i i y , … , y is the prediction vector geo by the Softmax classi- 2 k Input (None, 3, 7, 7) 0 ﬁer (3), and y represents the possibility of the ith sample 2D convolution (None, 1, 5, 32) 2048 label being t and is computed by (3). Max pooling (None, 1, 2, 16) 0 Batch normalization (None, 1, 2, 16) 64 exp θ c Dropout (None, 1, 2, 16) 0 y = 3 〠 exp θ c Flatten (None, 32) 0 j=1 t Dense (None, 32) 1056 During back propagation, (4) are adapted to update W Batch normalization (None, 32) 128 and b in every layer, where λ is the momentum which help Dropout (None, 32) 0 accelerate SGD by adding a fraction of the update value of Dense (Softmax) (None, 7) 231 the past time step to the current update value, α is the learn- ing rate, ∇W and ∇b are the gradients of J · with respect to W and b, respectively, and t just stands for the number of minibatch of stochastic gradient descent (SGD) was set to 32 epoch during SGD: images. Under the framework of Keras with Tensorﬂow back- end, the whole process was run on a CPU Core i7 2.6 GHz and W = W − λV − α∇W, t+1 t t memory ram (RAM) of 16 GB. In the experiments, 60% of the b = b − λU − α∇b total samples were randomly chosen for training, and the rest t+1 t t were chosen for testing, and overall accuracy (OA), average 3.2.5. Evaluation. This study uses several statistical accuracy accuracy (AA), Kappa coeﬃcient (κ), and per-class accuracy measures to evaluate diﬀerent models and compare them (PA) are used to evaluate the performance of the CNN classi- under various experimental conﬁgurations. These metrics ﬁcation method (Congalton and Green, 2008). The summary are overall accuracy (OA), average accuracy (AA), per-class of the model’s layers is shown in Table 1. accuracy (PA), and Kappa index (κ). They are calculated 3.2.4. Training the Model. The CNN model was trained with using the following equations [20]: backpropagation algorithm and stochastic gradient descent 〠D ii (SGD). It uses the minibatch’s backpropagation error to OA = , approximate the error of all the training samples, which accelerates the cycle of the weight update with smaller back 〠 PA AA = , propagation error to speed up the convergence of the whole model. The optimization was run to reduce the loss function ij (J) (i.e., categorical cross entropy) of CNN expressed as the PA = , following: i m m N〠 D − 〠 R · C ij i j i,j=1 i,j=1 N k k = , i i 2 JX , W, b, θ = − 〠〠 1 y = t · y , 2 N − 〠 R · C i j t i,j=1 i=1 j=1 where ∑D is the total number of correctly classiﬁed pixels, ii where X is normalized features, W and b are parameters of N is total number of pixels in the error matrix, m is the Journal of Sensors 7 Model accuracy Model loss 1.00 1.2 0.95 1.0 0.90 0.85 0.8 0.80 0.6 0.75 0.70 0.4 0.65 0.2 0.60 0.55 0.0 0 20 40 60 80 100 0 20 40 60 80 100 Epoch Epoch Training Training Validation Validation (a) (b) Figure 5: Performance of the CNN model with optimum parameters set, (a) model accuracy and (b) model loss for 93 epochs (early stopping). Table 2: PA of the CNN model. classify almost all the classes with relatively high accuracy. The minimum accuracy was 0.894 for the shadow class. Class PA While examining the confusion matrix (Table 3), the Road 0.971 results indicate that several (~11) samples of this class Waterbody 0.944 were misclassiﬁed as dense vegetation aﬀecting its PA. Grassland 0.972 The confusion matrix also shows that there were several Building 0.995 samples of water body class misclassiﬁed as grassland. Dense vegetation 0.999 4.1.2. CNN Model with Other Conﬁgurations. The CNN Shadow 0.894 model was also trained without dropout and batch nor- Barren land 0.980 malization to see their impacts on the accuracy of the classi- ﬁcation map. Table 4 summarizes the results of comparing CNN models with diﬀerent conﬁgurations (i.e., CNN + drop- number of classes, D is the number of correctly classiﬁed ij out + batch normalization, CNN + dropout, CNN + batch pixels in row i (in the diagonal cell), R is the total number normalization, and CNN). The results suggest that the of pixels in row i, and C is the total number of pixels in j use of dropout and batch normalization could improve column j. the accuracy (OA, AA, and κ) of the classiﬁcation by almost 4%. The use of batch normalization slightly per- formed better (OA = 0.964, AA = 0.956, κ = 0.961) than just 4. Experimental Results using dropout (OA = 0.958, AA = 0.956, κ = 0.954). Never- theless, the use of either dropout or batch normalization 4.1. Performance of the Proposed Model could improve the accuracy of the classiﬁcation compared 4.1.1. CNN with Dropout and Batch Normalization. Figure 5 to not using any of these techniques with the CNN model. shows the accuracy performance of the CNN model with The CNN model without these techniques achieved the dropout and batch normalization for 93 epochs on both following accuracies: OA = 0.932, AA = 0.922, κ = 0.922 training and validation datasets. The increment in model indicating the importance of such regularization methods accuracy and reduction in model loss over time indicates for aerial orthophoto classiﬁcation. The classiﬁed maps that the model has learned useful features to classify the produced by these methods are shown in Figure 6. Fur- image pixels into the diﬀerent class labels. The ﬂuctuations thermore, the performance plot (Figure 7) of the CNN in the accuracy from one epoch to another are because of model without dropout and batch normalization shows using dropout that yielded a slightly diﬀerent model at that this model overﬁts the training data and performs worse when applied to new data. Overall, the experimental each epoch. The OA, AA, κ of this model on validation dataset was 0.973, 0.965, 0.967, respectively. In addition, results on both training and validation data sets infer that Table 2 shows the per-class accuracy (PA) achieved by the proposed CNN architecture is a robust and eﬃcient the model. The results suggest that the CNN model could model, while the use of dropout and batch normalization Accuracy Loss 8 Journal of Sensors Table 3: The confusion matrix calculated for the CNN model. Road Waterbody Grassland Building Dense vegetation Shadow Barren land Road 1474 0 0 23 0 0 21 Water body 0 1463 85 0 0 1 0 Grassland 0 10 1323 027 0 0 Building 4 0 0 991 00 0 Dense vegetation 0 0 0 0 1070 10 Shadow 0 0 0 0 11 93 0 Barren land 6 0 0 8 0 0 716 regarding AA. Furthermore, performances of CNN with Table 4: Performance of CNN model with diﬀerent conﬁgurations. diﬀerent optimizers have been investigated, and the results indicated that “Adam” could be eﬀective in training com- Model OA AA κ pared to other optimizers. The highest OA (0.975) and κ CNN + dropout + batch normalization 0.973 0.965 0.967 (0.970) were achieved by the CNN model that was trained CNN + dropout 0.958 0.956 0.954 with “Adam.” However, when the optimizer “Nadam” was CNN + batch normalization 0.964 0.956 0.961 used to train CNN, the model could achieve the highest CNN 0.932 0.922 0.922 AA (0.974). The worst performance of CNN (OA = 0.945, AA = 0.949, and κ = 0.934) was found to be when the model was trained with SGD. Moreover, the eﬃciency of techniques as regularization methods is essential to obtain CNN was compared with diﬀerent batch sizes such as 4, high accuracy of classiﬁcation for the entire area rather 8, 16, 32, and 64. The batch size of 32 was found the best than just predicting the labels of the training samples. considering OA (0.975) and κ (0.970), while the batch size of 64 achieved the highest AA (0.975). 4.2. Sensitivity Analysis. The performance of CNN while Another important parameter in the proposed CNN is the patch size, which is the local neighborhood area that classifying orthophotos is highly dependent on its architec- ture and hyperparameters. Thus, the sensitivity analysis forms with the size (n × n). The advantage of using could serve as an essential step in ﬁnding a good set of patch-based learning for orthophoto classiﬁcation is parameters and architecture conﬁgurations in addition to sourced from the beneﬁts of spectral and spatial informa- an understanding of the model behavior. Figure 8 shows tion of the data that can improve the accuracy compared to just using the individual pixels (only spectral informa- the impact of diﬀerent parameters (e.g., number of convolutional ﬁlters, activation function, drop probability, tion). To understand this parameter and ﬁnd its subopti- optimizer, batch size, and patch size) on the validation mal value, several experiments were conducted with accuracy of CNN. diﬀerent patch sizes (n = 3, 5, 7, 9, 11, 13 ). The statistical For convolutional ﬁlters, the sensitivity analysis shows analysis in terms of model accuracy indicates that using that larger number of ﬁlters can lead to an increase in per- larger n yields higher accuracy (Figure 8). However, when formance. However, use of larger number of ﬁlters can analyzing the classiﬁcation map visually, the use of larger increase training time and overﬁt the training data if the n reduces the spatial quality of the features in the classiﬁ- model is not regularized properly. Thus, this parameter cation map (Figure 9). As a result, we considered n =7 as was set to 32 as an optimal setting and not exploring a an eﬀective value for this parameter as it achieved rela- larger number of ﬁlters. With this conﬁguration, the tively high accuracy measured by OA, AA, and κ as well model could achieve the following accuracies: OA = 0.956, as high spatial quality features. AA = 0.945, and κ = 0.947. In addition, this analysis shows that the activation function “ReLU” outperformed the 4.3. Training Time Analysis. The computing performance of other two functions (“Sigmoid” and “ELU”). By using this the CNN model was dependent on the use of dropout and activation, the CNN model could achieve an OA of 0.956 batch normalization layers in the network architecture in higher than the second best activation “Sigmoid” by addition to other hyperparameters such as a number of ~4.4%. ReLU also facilitates faster training and reduced convolutional ﬁlters and image patch size. Table 5 shows likelihood of vanishing gradient. The experiments on drop the training time of the CNN model with diﬀerent conﬁg- probability showed that diﬀerent parametric values can urations. When early stopping was applied, the training of improve the performance of CNN depending on the accu- CNN with dropout and batch normalization took about racy metric. For example, results showed that the use of 124 seconds on a CPU. Removing the batch normalization drop probability as 0.2 could optimize the model for OA from the architecture yielded a training time of 150 sec- and κ, where the model achieved an OA and κ of 0.975, onds, whereas CNN with dropout took 75 seconds to be 0.970, respectively. However, drop probability of 0.3 could trained. The CNN model without the use of dropout and perform better than the value of 0.2 for this parameter batch normalization took the shortest time (58.4 seconds) Journal of Sensors 9 Road Road Dense vegetation Dense vegetation Water body Shadow Water body Shadow Barren land Barren land Grassland Grassland Building Building (a) (b) Road Dense vegetation Road Dense vegetation Water body Shadow Water body Shadow Grassland Barren land Barren land Grassland Building Building (c) (d) Figure 6: Classiﬁcation maps produced by CNN models, (a) CNN + dropout + batch normalization, (b) CNN + dropout, (c) CNN + batch normalization, and (d) CNN. to be trained. On the other hand, when the model was Model loss trained with 200 epochs without early stopping, the model 1.2 (CNN + dropout + batch normalization) took about 230 seconds longer than that with early stopping by 106 sec- 1.0 onds. In addition, the other models (CNN + dropout, 0.8 CNN + batch normalization, and CNN) were also required a longer time to train as it was expected due to more 0.6 number of epochs run. Overall, the computing perfor- mance of the proposed model is eﬃcient for the investi- 0.4 gated data. However, for larger datasets, the training of such models will require longer time, and as a result, 0.2 graphical processing units will be essential. 0.0 5. Conclusion 0 10 20 30 40 50 Epoch In this paper, a classiﬁcation model based on CNN and Training spectral-spatial feature learning has been proposed for aerial Validation photographs. With the utilization of advanced regularization techniques such as dropout and batch normalization, the Figure 7: The loss of the CNN model without dropout and batch proposed model could balance generalization ability and normalization. training eﬃciency. Use of such methods to improve the CNN model along with other techniques like preprocessing Loss 10 Journal of Sensors 0.8 0.6 Overall Average Kappa 0.8 accuracy accuracy Overall Average Kappa 0.874 0.874 0.848 accuracy accuracy 8 0.918 0.931 0.902 Sigmoid 0.912 0.903 0.893 0.945 ReLU 0.956 0.945 0.947 16 0.942 0.931 0.945 32 0.956 0.947 ELU 0.897 0.911 0.876 Number of convolutional filters Activation 0.8 Overall Average Kappa accuracy accuracy 0.8 Overall Average Adam 0.975 0.961 0.97 Kappa accuracy accuracy Nadam 0.971 0.974 0.965 0.2 0.975 0.961 0.97 SGD 0.945 0.949 0.934 0.3 0.954 0.964 0.945 Adamax 0.97 0.964 0.963 0.4 0.954 0.962 0.959 Adadelta 0.964 0.96 0.956 0.5 0.956 0.963 0.951 Adagrad 0.951 0.951 0.941 0.946 0.6 0.955 0.953 RMSProp 0.972 0.965 0.966 Drop probability Optimizer 1 1 0.8 Overall Average 0.8 Kappa accuracy accuracy Overall Average Kappa 3 × 3 accuracy accuracy 0.93 0.927 0.918 5 × 5 4 0.951 0.946 0.941 0.967 0.972 0.963 8 7 × 7 0.964 0.966 0.957 0.961 0.968 0.955 9 × 9 16 0.964 0.955 0.956 0.977 0.98 0.974 32 0.975 0.961 0.97 11 × 11 0.977 0.977 0.975 64 0.972 0.975 0.967 13 × 13 0.985 0.986 0.984 Batch size Patch size Figure 8: The inﬂuence of hyperparameters, the number of convolutional ﬁlters, activation function, drop probability, optimizer, batch size, and patch size. (geometric calibration and feature normalization) and sensi- outperforming the traditional CNN model by ~4% in all tivity analysis could make these models robust for classifying the accuracy indicators. The short training time (124 sec- the given dataset. The CNN model acts as a feature extractor, onds) conﬁrmed the robustness of the proposed model for and a classiﬁer could be trained end-to-end given training small and medium scale remote sensing datasets. The future samples. The network architecture can eﬀectively handle work should focus on scaling this architecture for large the inter- and intraclass complexity inside the scene. The best remote sensing datasets and other data sources such as satel- model achieved OA = 0.973, AA = 0.965, and κ = 0.967 lite images and laser scanning point clouds. Validation accuracy Validation accuracy Validation accuracy Validation Validation Validation accuracy accuracy accuracy Journal of Sensors 11 3 × 3 5 × 5 7 × 7 9 × 9 11 × 11 13 × 13 Figure 9: Eﬀects of patch size on the quality of classiﬁed maps. Table 5: The training time in seconds of CNN with diﬀerent [3] S. Basu, S. Ganguly, S. Mukhopadhyay, R. DiBiano, M. Karki, conﬁgurations for 200 epochs. and R. Nemani, “DeepSat: a learning framework for satellite imagery,” in Proceedings of the 23rd SIGSPATIAL Interna- Time tional Conference on Advances in Geographic Information Time (seconds)—with Model (seconds)—full Systems, p. 37, New York, NY, USA, November 2015. early stopping training [4] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning- CNN + dropout based classiﬁcation of hyperspectral data,” IEEE Journal of 124 230 + batch normalization Selected Topics in Applied Earth Observations and Remote Sensing, vol. 7, no. 6, pp. 2094–2107, 2014. CNN + dropout 150 168 [5] W. Li, G. Wu, F. Zhang, and Q. Du, “Hyperspectral image CNN + batch 75 219 classiﬁcation using deep pixel-pair features,” IEEE Transac- normalization tions on Geoscience and Remote Sensing, vol. 55, no. 2, CNN 58.4 158 pp. 844–853, 2017. [6] W. Zhao and S. Du, “Spectral–spatial feature extraction for hyperspectral image classiﬁcation: a dimension reduction Data Availability and deep learning approach,” IEEE Transactions on Geosci- ence and Remote Sensing, vol. 54, no. 8, pp. 4544–4554, These data were used from a research project lead by Profes- sor Biswajeet Pradhan. Very high resolution aerial photos [7] Y. T. Hsieh, C. T. Chen, and J. C. Chen, “Applying object- were used in this research. The data can be made available based image analysis and knowledge-based classiﬁcation to upon request. ADS-40 digital aerial photographs to facilitate complex forest land cover classiﬁcation,” Journal of Applied Remote Sensing, vol. 11, no. 1, article 015001, 2017. Conflicts of Interest [8] M. F. A. Vogels, S. M. De Jong, G. Sterk, and E. A. Addink, “Agricultural cropland mapping using black-and-white aerial The authors declare that they have no conﬂicts of interest. photography, object-based image analysis, and random for- ests,” International Journal of Applied Earth Observation and Geoinformation, vol. 54, pp. 114–123, 2017. References [9] X. Meng, N. Shang, X. Zhang et al., “Photogrammetric UAV [1] Ö. Akar, “The Rotation Forest algorithm and object-based mapping of terrain under dense coastal vegetation: an object- classiﬁcation method for land use mapping through UAV oriented classiﬁcation ensemble algorithm for classiﬁcation images,” Geocarto International, vol. 33, no. 5, pp. 538–553, and terrain correction,” Remote Sensing, vol. 9, no. 11, p. 1187, 2017. 2017. [2] Q. Wu, R. Zhong, W. Zhao, H. Fu, and K. Song, “A [10] A. Juel, G. B. Groom, J. C. Svenning, and R. Ejrnaes, “Spatial comparison of pixel-based decision tree and object-based application of random forest models for ﬁne-scale coastal veg- Support Vector Machine methods for land-cover etation classiﬁcation using object based analysis of aerial classiﬁcation based on aerial images and airborne lidar data,” orthophoto and DEM data,” International Journal of Applied International Journal of Remote Sensing, vol. 38, no. 23, Earth Observation and Geoinformation, vol. 42, pp. 106–114, pp. 7176–7195, 2017. 2015. 12 Journal of Sensors [11] L. Albert, F. Rottensteiner, and C. Heipke, “A higher order conditional random ﬁeld model for simultaneous classiﬁcation of land cover and land use,” ISPRS Journal of Photogrammetry and Remote Sensing, vol. 130, pp. 63–80, 2017. [12] G. Cheng, C. Ma, P. Zhou, X. Yao, and J. Han, “Scene clas- siﬁcation of high resolution remote sensing images using convolutional neural networks,” in 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 767–770, Beijing, China, July 2016. [13] G. J. Scott, M. R. England, W. A. Starms, R. A. Marcum, and C. H. Davis, “Training deep convolutional neural networks for land-cover classiﬁcation of high-resolution imagery,” IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 4, pp. 549– 553, 2017. [14] J. Sherrah, “Fully convolutional networks for dense semantic labelling of high-resolution aerial imagery,” 2016, http://arxi- v.org/abs/1606.02585. [15] W. Yao, P. Poleswki, and P. Krzystek, “Classiﬁcation of urban aerial data based on pixel labelling with deep convolutional neural networks and logistic regression,” ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XLI-B7, pp. 405–410, 2016. [16] X. Sun, X. Lin, S. Shen, and Z. Hu, “High-resolution remote sensing data classiﬁcation over urban areas using random for- est ensemble and fully connected conditional random ﬁeld,” ISPRS International Journal of Geo-Information, vol. 6, no. 8, p. 245, 2017. [17] J. R. Bergado, C. Persello, and C. Gevaert, “A deep learning approach to the classiﬁcation of sub-decimetre resolution aerial images,” in 2016 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pp. 1516–1519, Beijing, China, July 2016. [18] Y. LeCun, L. Bottou, Y. Bengio, and P. Haﬀner, “Gradient- based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998. [19] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015. [20] A. Bogoliubova and P. Tymków, “Accuracy assessment of automatic image processing for land cover classiﬁcation of St. Petersburg protected area,” Acta Scientiarum Polonorum. Geodesia et Descriptio Terrarum, vol. 13, no. 1-2, 2014. International Journal of Advances in Rotating Machinery Multimedia Journal of The Scientific Journal of Engineering World Journal Sensors Hindawi Hindawi Publishing Corporation Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 http://www www.hindawi.com .hindawi.com V Volume 2018 olume 2013 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Journal of Control Science and Engineering Advances in Civil Engineering Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 Submit your manuscripts at www.hindawi.com Journal of Journal of Electrical and Computer Robotics Engineering Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 VLSI Design Advances in OptoElectronics International Journal of Modelling & Aerospace International Journal of Simulation Navigation and in Engineering Engineering Observation Hindawi Hindawi Hindawi Hindawi Volume 2018 Volume 2018 Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com www.hindawi.com www.hindawi.com Volume 2018 International Journal of Active and Passive International Journal of Antennas and Advances in Chemical Engineering Propagation Electronic Components Shock and Vibration Acoustics and Vibration Hindawi Hindawi Hindawi Hindawi Hindawi www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018
Journal of Sensors – Hindawi Publishing Corporation
Published: Jun 26, 2018
You can share this free article with as many people as you like with the url below! We hope you enjoy this feature!
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.