Multiview Machine Vision Research of Fruits Boxes Handling Robot Based on the Improved 2D Kernel Principal Component Analysis Network
Multiview Machine Vision Research of Fruits Boxes Handling Robot Based on the Improved 2D Kernel...
Li, Xinning;Wu, Hu;Yang, Xianhai;Xue, Peng;Tan, Shuai
2021-07-08 00:00:00
Hindawi Journal of Robotics Volume 2021, Article ID 3584422, 13 pages https://doi.org/10.1155/2021/3584422 Research Article Multiview Machine Vision Research of Fruits Boxes Handling Robot Based on the Improved 2D Kernel Principal Component Analysis Network 1 1 1 2 2 Xinning Li , Hu Wu, Xianhai Yang , Peng Xue, and Shuai Tan School of Mechanical Engineering, Shandong University of Technology, Zibo 255000, China National Engineering Research Center for Production Equipment, Dongying 257091, China Correspondence should be addressed to Xianhai Yang; yxh@sdut.edu.cn Received 1 May 2021; Accepted 21 June 2021; Published 8 July 2021 Academic Editor: L. Fortuna Copyright © 2021 Xinning Li et al. 'is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In order to better realize the orchard intelligent mechanization and reduce the labour intensity of workers, the study of intelligent fruit boxes handling robot is necessary. 'e first condition to realize intelligence is the fruit boxes recognition, which is the research content of this paper. 'e method of multiview two-dimensional (2D) recognition was adopted. A multiview dataset for fruits boxes was built. For the sake of the structure of the original image, the model of binary multiview 2D kernel principal component analysis network (BM2DKPCANet) was established to reduce the data redundancy and increase the correlation between the views. 'e method of multiview recognition for the fruits boxes was proposed combining BM2DKPCANet with the support vector machine (SVM) classifier. 'e performance was verified by comparing with principal component analysis network (PCANet), 2D principal component analysis network (2DPCANet), kernel principal component analysis network (KPCANet), and binary multiview kernel principal component analysis network (BMKPCANet) in terms of recognition rate and time consumption. 'e experimental results show that the recognition rate of the method is 11.84% higher than the mean value of PCANet though it needs more time. Compared with the mean value of KPCANet, the recognition rate exceeded 2.485%, and the time saved was 24.5%. 'e model can meet the requirements of fruits boxes handling robot. precision agriculture. In order to realize intelligent handling, 1. Introduction this paper studied the fruits boxes recognition based on As the primary industry of the national economy, agriculture machine vision. is the primary condition for all production, and the proposal According to the different modelling methods of target of precision agriculture has put forward higher require- appearance, the research results of target recognition in ments. 'e fruit industry, as a labour-intensive industry, has recent years have been divided into three categories [4]: a large demand for labour and low work efficiency. 'e based on feature invariants, representation learning, and deep learning. 'e view models based on feature invariants automation and mechanization industry chain needs up- grade urgently [1]. With the rapid development of the ar- extract the features of multiple images from different per- tificial intelligence, the fruit recognition and fruit picking spectives and then train the classifier, which are used for the have always being studied [2]. 'ere are relatively few occasions with a small number of training samples. 'e studies on fruit handling [3]. On farms and in wholesale fruit research studies mainly focus on the construction of artificial markets, the handling of fruits boxes is still dominated by features and classification algorithms, and many outstanding manual labour, which is time-consuming and labour-con- works have emerged. Due to the necessity to study the suming. In the new era, the cost of manual labour is in- characteristic invariance of the target in advance, candidate creasing year by year, which cannot meet the demand of features have characters such as weak adaptability, weak 2 Journal of Robotics efficiency of feature extraction because it was not end-to- generalization ability, and large application limitation. It has the large feature description vector dimension and high end network. 'e end-to-end group-pair convolutional neural network (GPCNN) was established in [15]. 'e training cost of the classifier. Researchers proposed the methods based on subspace learning to solve the problems, small-scale problem could be solved. 'e novel pairwise which transformed high-dimensional feature vectors into multiview convolutional neural network (PMV-CNN) was low-dimensional ones. 'e classifiers were trained in the proposed in [16], which focused on complementary in- subspace. 'e typical representative methods are as follows: formation between views. 'e feature extraction and target principal component analysis (PCA) based on unsupervised recognition are unified into CNN. It could improve the learning and linear discriminant analysis (LDA) based on robustness of feature extraction obviously when the number of training samples was small. In order to make up supervised learning. Based on these, the methods with low data dimension, strong noise processing ability, and high for the disadvantages caused by random images selection in multiview recognition, a multiview discrimination and efficiency were put forward, such as robust PCA (RPCA), inductive RPCA (IRPCA), kernel PCA (KPCA), two-di- pairwise convolutional neural network (MDPCNN) was obtained by adding the Slice layer and the Concat layer in mensional PCA (2DPCA), and discriminative low-rank and sparse principal feature coding (D-LSPFC). With the [17]. 'e model was verified that it had good intraclass emergence of a large number of public image datasets, the compactness and interclass separability. 'e multiview- target recognition methods based on deep learning have based Siamese convolutional neural network was exploited been studied more and more. 'e models based on con- in [18]. An end-to-end multiview 3D fingerprint learning volutional neural network (CNN) promoted the develop- model was proposed in [19], which included full con- ment of computer vision in particular by virtue of its strong volutional network and three Siamese networks. 'e multiview generator module was used in [20] to project the nonlinear feature expression ability and good generalization performance. Region CNN (R-CNN) [5] applied deep 3D point cloud to the plane at a specific angle. On the premise of retaining the underlying features, spatial learning in the target recognition for the first time. And then deep convolutional neural networks Fast R-CNN [6] and refusion operation was adopted to realize the interaction between different projections, and the features were Faster R-CNN [7] were proposed by combining the training and testing process, which improved the identification ac- reconstructed for target recognition. Based on the semi- curacy and efficiency greatly. As the product of integrating supervised learning and expectation maximization, a fuzzy logic reasoning and self-learning ability of neural multiview fusion strategy classification method with the network, neurofuzzy network has also been widely used [8]. ability of label propagation was proposed in [21]. An end- 'e CNN-based single shot detector (SSD) [9] and the to-end cloud convolutional neural network was built in YOLO [10] deep learning object detection method further [22] based on the projection network mechanism. 'e point cloud was projected into a two-dimensional view improved a new height in real-time effect. On this basis, the proposed YOLOv2 [11] and YOLO V3 [12] gradually im- with rich discriminant features, and the robustness and accuracy had been improved significantly. Multiview proved the running speed and robustness, and the detection performance had been significantly improved. YOLO V4 features projections were coded as binary in [23]. 'e [13] achieved double improvement in speed and accuracy, recognition descriptors were assembled block statistical which took CSPDarkNet53 + SPP + PANet (path-aggrega- features. Although the above methods have achieved good tion-neck) + yolov3-head as the model. It is undeniable that results, the models based on the convolutional neural the effect of the target recognition algorithm based on deep network (CNN) also have some problems, such as com- learning is remarkable, and the recognition accuracy is much plicated structure, long training cycle. 'ey do not seem to higher than the traditional manual methods and the rep- be the best choice for fruits boxes handling robots. From resentation learning methods. However, it cannot be ignored the above research, it can be concluded that considering the relationship between multiple views can improve the that the target recognition still has great challenges in some occasions, such as target overlap, partial occlusion, high recognition accuracy and robustness, and the binary coding method can improve the operation efficiency of the similarity, complex environment, and strong interference. 'e methods with complex models, long training time, and model, which also become the research factors of the new high requirements for hardware computing power have model developed in this paper. affected the application in mobile robots. No system is perfect. 'e hidden state of the inevitable As a three-dimensional (3D) object, the direct ex- uncertainty in the system can be stimulated, and the traction and recognition of 3D features for fruit box lead to connection between these uncertainties and the object complex calculation and high operation storage. A view- system can be established to improve the system perfor- mance [24]. Although fruit packing boxes are generally based method is adopted in this paper, that is, 3D objects are represented through multiple views. As a common regular cubes, traditional rule-based feature extraction and recognition methods cannot achieve better results because method of 3D objects recognition, multiview learning model and recognition method have also received more of the variety of fruits and the influence of surface patterns, colours, and surrounding environment. 'erefore, deep attention. 'e multiview-based convolutional neural network (MVCNN) was built in [14]. 'e maximum learning algorithm is more advantageous. 'e current pooling layers blended the multiple views features. 'e deep learning target recognition algorithm is an end-to- MVCNN model had low convergence speed and low end solution; that is, it is completed in one step from the Journal of Robotics 3 input image to the output task result, but it is completed in 2. Related Works stages internally as image feature extraction network In order to reduce the sample dimension and obtain the classification and regression. Aiming at the long training nonlinear correlation between multiple pixels, some scholars time of the classic CNN parameters, the simple principal have proposed a series of algorithms by synthesizing the component analysis network (PCANet) was built in [25]. advantages of 2DPCA [30] and KPCA [31]. Nhat and Lee 'e convolution layer of CNN was introduced into the [32] proposed the kernel-based 2DPCA, which directly classical feature extraction framework of “Feature Map- extracted nonlinear features from two-dimensional images. Pattern Map-Histogram.” 'e unsupervised hierarchical 'e nonlinear correlation analysis of matrix was realized. features were obtained. 'e high computational com- However, the storage requirement of kernel matrix was plexity caused by iteration and optimization was avoided. higher when training samples were large. Zhou et al. [33] It has been widely used with simple model and rapid calculated the low-rank approximate decomposition of calculation. Since PCA could not extract the nonlinear kernel matrix using Cholesky decomposition method to relationship between images, the kernel principal com- achieve nonlinear feature extraction. 'e computational ponent analysis network (KPCANet) model was estab- efficiency was low in the test stage. Xu et al. [34] used Laplace lished in [26], which achieved better classification results to reduce dimension after the 2DKPCA. Choi et al. [35] than PCANet. In order to remove the redundancy of proposed the incremental 2DKPCA (I2DKPCA), which multiperspective views, our team proposed a binary reduced the calculation speed and improved the perfor- multiview kernel principal component analysis network mance of feature extraction. Zhang et al. [29] built the 2- (BMKPCANet) model [27] for the multiview objects dimensional kernel PCA (2DKPCA) framework. 'e per- recognition. However, the model converted two-dimen- formance of unilateral 2DKPCA (row and column) and that sional image matrix into vector when the features were of bilateral 2DKPCA in face and object recognitions were extracted, and the original image structure was destroyed, compared, respectively. Mohammad et al. [36] matched and the computation was also large, so we improved the historical parameters by bilateral 2DKPCA. Xiang et al. [37] model. Inspired by the two-dimensional principal com- realized dimensionality reduction for hyperspectral images ponent analysis network (2DPCANet) [28] and the two- using the segmented row-column K2DPCA method. In dimensional kernel principal component analysis order to reduce the storage requirement and computational (2DKPCA) [29], the images of fruits boxes were processed complexity of kernel matrix, blockwise methods were by 2DKPCA, and a new multiview feature extraction proposed [38, 39], which transformed the large kernel model was established. 'e main contributions of this matrix into several small kernel matrices and then combined work were summarized as follows: the eigenvectors of small kernel matrices. Wang and Zhou (1) A binary multiview two-dimensional kernel princi- [40] mixed image blocks and vector method. 'e scale of the pal component analysis network (BM2DKPCANet) kernel matrix was decreased by taking several adjacent rows model was built to extract clustering features, which or columns of the graph as a computing unit for non- can reduce data redundancy and realize binary mapping. Chen et al. [41] proposed bidirectional two-di- multiview clustering. mensional kernel quaternion principal component analysis (BD2DKQPCA) for colour image recognition. 'e kernel (2) 'e multiview recognition method of fruits boxes matrix was used to replace the covariance matrix between was proposed combining BM2DKPCANet model samples, which avoided the high-complexity calculation of with the support vector machine (SVM). high-dimensional space. 'en they improved 2DKQPCA by (3) 'e proposed method was compared with the adding blockwise process [42]. According to the charac- PCANet, 2DPCANet, KPCANet, and BMKPCANet teristics of quaternion Hermitian matrix, the blocks of main models on the fruits boxes dataset and ETH-80 and diagonal, next to the main diagonal, and backdiagonal di- COIL-100 public datasets. Taking the recognition rection were analyzed. accuracy and time consumption as the evaluation 'rough the research of the above algorithms, consid- indexes, the experiments showed that the recogni- ering the recognition and computing performance, this tion performance of the proposed method was su- paper sampled images in blocks when extracting the image perior to other methods. features. It had been demonstrated that the recognition 'e rest of this work was organized as follows. 'e performance of the column-oriented algorithm was superior methods based on the 2DPCA and KPCA are introduced in to the row-oriented algorithm by experiments in the pro- Section 2. 'e obtaining method of fruits boxes images from posed B2DKPCA [38], the bidirectional two-dimensional KQPCA (BD2DKQPCA) [41], and the block-based multiview angles is introduced in Section 3. 'e feature ex- traction process of the proposed BM2DKPCANet algorithm 2DKQPCA [42]. So this work adopted column-oriented algorithm to conduct 2DKPCA; that is, the column vector of and the identification process of fruits boxes are also dis- the image sample is mapped to a high-dimensional space cussed in detail in Section 3. 'e experimental process, through the nonlinear mapping function. 'e kernel matrix results, and discussion are shown in Section 4. Finally, the replaced the covariance matrix. research and the future work are summarized in Section 5. 4 Journal of Robotics 3. Materials and Methods 3.1. Experimental Materials Camera 3.1.1. Establishment of Multiview Dataset of Fruits Boxes. 'is work adopted the multiview feature method to collect images. Under the principle of ensuring that the set of projected views is as small as possible and can represent many common attitudes of the boxes, several two-dimen- sional projections with different viewpoints are used to describe the features of the boxes. In order to describe and Projection establish visual model preferably, the relative position re- lation between fixed view and boxes in different positions was transformed into the relation between relative move- ment view and fixed boxes. Various observed postures of the boxes in normal operation were collected under the motion view. Since the opposite sides of the boxes had the same Figure 1: Image acquisition method. pattern generally, multiple semiarc viewpoint projection model was set up, as shown in Figure 1. 'e camera kept moving on the green cambered surface, and the multiple different postures of the boxes are obtained. 'e semiarc viewpoints surface must be divided into small areas to obtain the projection of 3D targets with different attitudes. 'e view areas are reasonably divided and distributed viewpoints to ensure that the projection view set is as small as possible and can represent multiple common attitudes of the boxes. 'e distribution of view- points was described by the representation of latitude and longitude in geography based on the idea of uniform di- vision and morphology diagram method [43] to simulate Figure 2: Distribution of viewpoints. the box postures in the real situation, as shown in Figure 2. 'e projection of the box at each viewpoint corresponds to composed of the 720 images of 20 objects randomly. 'e a two-dimensional image, respectively, and the multiview partial images of the ETH-80 and COIL-100 are shown in projection model of the box was constructed. Figure 5. 'e experimental objects were from the fruit wholesale market of Zhangdian District, Zibo City, Shandong Prov- ince, China. A total of 15 different types for 10 kinds of fruits 3.2. BM2DKPCANet Model Based on 2DKPCA boxes were selected in the experiment, which were defined as 3.2.1. Construction of Feature Extraction Model. Since the apple1, apple2, apple3, watermelon1, watermelon2, orange1, image database is composed of several two-dimensional orange2, cantaloupe1, cantaloupe2, pomegranate, pear, multiview images, the images as much as possible represent durian, coconut, banana, and pineapple, as shown in Fig- the common postures of the boxes, which lead to a lot of data ure 3. Multiview collection was carried out for the boxes of redundancy. In order to reduce unnecessary data storage, each category, which is shown in Figure 4. 200 samples of this paper added clustering step in the feature extraction each category were retained. 'e image size was normalized model of fruits boxes, as shown in Figure 6. According to the to 32 × 32, and gray processing was carried out. related research principal component analysis network, the two-layer 2DKPCA network was constructed. 'e extracted feature vectors were binary clustering coded at the same 3.1.2. Multiview Public Datasets. In order to fully verify the time. 'e clustering feature representation of decimal sys- performance of the proposed multiview recognition al- tem was obtained by block histogram transformation, and gorithm, the recognition performance tests are also car- the clustering feature extraction was completed. ried out on public datasets ETH-80 [44] and COIL-100 [45]. 'e ETH-80 dataset contains 8 species classes. Each species is an image set of 10 different objects, which 3.2.2. BM2DKPCANet Model contains 41 images of each object taken from different angles. 4 objects of each species were randomly selected to (1) First 2DKPCA. 'e image size of database was adjusted to form the training set, and the rest were used as the test set m ×n. As the input layer I , patch sampling was sliding in this paper. 'e COIL-100 dataset contains images of performed by k ×k window. All sample patches were 100 objects. Each object was taken at 72 different angles gathered and cascaded. 'e jth patch of the ith image was within a 360 circumference. 'e training set was defined as x . 'e ith image could be expressed as i,j Journal of Robotics 5 Apple1 Apple2 Apple3 Watermelon1 Watermelon2 Orange1 Orange2 Cantaloupe1 Cantaloupe2 Pomegranate Pear Durian Coconut Banana Pineapple Figure 3: Categories of fruits boxes dataset. After doing the same progress for the other images, the feature analysis based on 2DKPCA was performed on the local feature matrix. Due to not needing explicit form after mapping, and in order to avoid complex calculation in high- dimensional space, the covariance matrix after samples mapping was replaced by kernel matrix [41]. Training m ×n sample matrix I i (i � 1, 2, . . ., S)∈R was converted to k ×mn local eigenmatrix X (i � 1, 2, . . ., S)∈ R after patches sliding sampling. 'e dimension of the column direction kernel matrix for S training samples is Smn ×Smn, which requires a large amount of computation. 'is work adopted average column vectors to replace the original mn column vectors [29]; then the sample of nonlinear mappings for training became X ; that is, Ψ: R ⟶ F, (4) X ⟶ ϕ