Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Multiview Machine Vision Research of Fruits Boxes Handling Robot Based on the Improved 2D Kernel Principal Component Analysis Network

Multiview Machine Vision Research of Fruits Boxes Handling Robot Based on the Improved 2D Kernel... Hindawi Journal of Robotics Volume 2021, Article ID 3584422, 13 pages https://doi.org/10.1155/2021/3584422 Research Article Multiview Machine Vision Research of Fruits Boxes Handling Robot Based on the Improved 2D Kernel Principal Component Analysis Network 1 1 1 2 2 Xinning Li , Hu Wu, Xianhai Yang , Peng Xue, and Shuai Tan School of Mechanical Engineering, Shandong University of Technology, Zibo 255000, China National Engineering Research Center for Production Equipment, Dongying 257091, China Correspondence should be addressed to Xianhai Yang; yxh@sdut.edu.cn Received 1 May 2021; Accepted 21 June 2021; Published 8 July 2021 Academic Editor: L. Fortuna Copyright © 2021 Xinning Li et al. 'is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In order to better realize the orchard intelligent mechanization and reduce the labour intensity of workers, the study of intelligent fruit boxes handling robot is necessary. 'e first condition to realize intelligence is the fruit boxes recognition, which is the research content of this paper. 'e method of multiview two-dimensional (2D) recognition was adopted. A multiview dataset for fruits boxes was built. For the sake of the structure of the original image, the model of binary multiview 2D kernel principal component analysis network (BM2DKPCANet) was established to reduce the data redundancy and increase the correlation between the views. 'e method of multiview recognition for the fruits boxes was proposed combining BM2DKPCANet with the support vector machine (SVM) classifier. 'e performance was verified by comparing with principal component analysis network (PCANet), 2D principal component analysis network (2DPCANet), kernel principal component analysis network (KPCANet), and binary multiview kernel principal component analysis network (BMKPCANet) in terms of recognition rate and time consumption. 'e experimental results show that the recognition rate of the method is 11.84% higher than the mean value of PCANet though it needs more time. Compared with the mean value of KPCANet, the recognition rate exceeded 2.485%, and the time saved was 24.5%. 'e model can meet the requirements of fruits boxes handling robot. precision agriculture. In order to realize intelligent handling, 1. Introduction this paper studied the fruits boxes recognition based on As the primary industry of the national economy, agriculture machine vision. is the primary condition for all production, and the proposal According to the different modelling methods of target of precision agriculture has put forward higher require- appearance, the research results of target recognition in ments. 'e fruit industry, as a labour-intensive industry, has recent years have been divided into three categories [4]: a large demand for labour and low work efficiency. 'e based on feature invariants, representation learning, and deep learning. 'e view models based on feature invariants automation and mechanization industry chain needs up- grade urgently [1]. With the rapid development of the ar- extract the features of multiple images from different per- tificial intelligence, the fruit recognition and fruit picking spectives and then train the classifier, which are used for the have always being studied [2]. 'ere are relatively few occasions with a small number of training samples. 'e studies on fruit handling [3]. On farms and in wholesale fruit research studies mainly focus on the construction of artificial markets, the handling of fruits boxes is still dominated by features and classification algorithms, and many outstanding manual labour, which is time-consuming and labour-con- works have emerged. Due to the necessity to study the suming. In the new era, the cost of manual labour is in- characteristic invariance of the target in advance, candidate creasing year by year, which cannot meet the demand of features have characters such as weak adaptability, weak 2 Journal of Robotics efficiency of feature extraction because it was not end-to- generalization ability, and large application limitation. It has the large feature description vector dimension and high end network. 'e end-to-end group-pair convolutional neural network (GPCNN) was established in [15]. 'e training cost of the classifier. Researchers proposed the methods based on subspace learning to solve the problems, small-scale problem could be solved. 'e novel pairwise which transformed high-dimensional feature vectors into multiview convolutional neural network (PMV-CNN) was low-dimensional ones. 'e classifiers were trained in the proposed in [16], which focused on complementary in- subspace. 'e typical representative methods are as follows: formation between views. 'e feature extraction and target principal component analysis (PCA) based on unsupervised recognition are unified into CNN. It could improve the learning and linear discriminant analysis (LDA) based on robustness of feature extraction obviously when the number of training samples was small. In order to make up supervised learning. Based on these, the methods with low data dimension, strong noise processing ability, and high for the disadvantages caused by random images selection in multiview recognition, a multiview discrimination and efficiency were put forward, such as robust PCA (RPCA), inductive RPCA (IRPCA), kernel PCA (KPCA), two-di- pairwise convolutional neural network (MDPCNN) was obtained by adding the Slice layer and the Concat layer in mensional PCA (2DPCA), and discriminative low-rank and sparse principal feature coding (D-LSPFC). With the [17]. 'e model was verified that it had good intraclass emergence of a large number of public image datasets, the compactness and interclass separability. 'e multiview- target recognition methods based on deep learning have based Siamese convolutional neural network was exploited been studied more and more. 'e models based on con- in [18]. An end-to-end multiview 3D fingerprint learning volutional neural network (CNN) promoted the develop- model was proposed in [19], which included full con- ment of computer vision in particular by virtue of its strong volutional network and three Siamese networks. 'e multiview generator module was used in [20] to project the nonlinear feature expression ability and good generalization performance. Region CNN (R-CNN) [5] applied deep 3D point cloud to the plane at a specific angle. On the premise of retaining the underlying features, spatial learning in the target recognition for the first time. And then deep convolutional neural networks Fast R-CNN [6] and refusion operation was adopted to realize the interaction between different projections, and the features were Faster R-CNN [7] were proposed by combining the training and testing process, which improved the identification ac- reconstructed for target recognition. Based on the semi- curacy and efficiency greatly. As the product of integrating supervised learning and expectation maximization, a fuzzy logic reasoning and self-learning ability of neural multiview fusion strategy classification method with the network, neurofuzzy network has also been widely used [8]. ability of label propagation was proposed in [21]. An end- 'e CNN-based single shot detector (SSD) [9] and the to-end cloud convolutional neural network was built in YOLO [10] deep learning object detection method further [22] based on the projection network mechanism. 'e point cloud was projected into a two-dimensional view improved a new height in real-time effect. On this basis, the proposed YOLOv2 [11] and YOLO V3 [12] gradually im- with rich discriminant features, and the robustness and accuracy had been improved significantly. Multiview proved the running speed and robustness, and the detection performance had been significantly improved. YOLO V4 features projections were coded as binary in [23]. 'e [13] achieved double improvement in speed and accuracy, recognition descriptors were assembled block statistical which took CSPDarkNet53 + SPP + PANet (path-aggrega- features. Although the above methods have achieved good tion-neck) + yolov3-head as the model. It is undeniable that results, the models based on the convolutional neural the effect of the target recognition algorithm based on deep network (CNN) also have some problems, such as com- learning is remarkable, and the recognition accuracy is much plicated structure, long training cycle. 'ey do not seem to higher than the traditional manual methods and the rep- be the best choice for fruits boxes handling robots. From resentation learning methods. However, it cannot be ignored the above research, it can be concluded that considering the relationship between multiple views can improve the that the target recognition still has great challenges in some occasions, such as target overlap, partial occlusion, high recognition accuracy and robustness, and the binary coding method can improve the operation efficiency of the similarity, complex environment, and strong interference. 'e methods with complex models, long training time, and model, which also become the research factors of the new high requirements for hardware computing power have model developed in this paper. affected the application in mobile robots. No system is perfect. 'e hidden state of the inevitable As a three-dimensional (3D) object, the direct ex- uncertainty in the system can be stimulated, and the traction and recognition of 3D features for fruit box lead to connection between these uncertainties and the object complex calculation and high operation storage. A view- system can be established to improve the system perfor- mance [24]. Although fruit packing boxes are generally based method is adopted in this paper, that is, 3D objects are represented through multiple views. As a common regular cubes, traditional rule-based feature extraction and recognition methods cannot achieve better results because method of 3D objects recognition, multiview learning model and recognition method have also received more of the variety of fruits and the influence of surface patterns, colours, and surrounding environment. 'erefore, deep attention. 'e multiview-based convolutional neural network (MVCNN) was built in [14]. 'e maximum learning algorithm is more advantageous. 'e current pooling layers blended the multiple views features. 'e deep learning target recognition algorithm is an end-to- MVCNN model had low convergence speed and low end solution; that is, it is completed in one step from the Journal of Robotics 3 input image to the output task result, but it is completed in 2. Related Works stages internally as image feature extraction network In order to reduce the sample dimension and obtain the classification and regression. Aiming at the long training nonlinear correlation between multiple pixels, some scholars time of the classic CNN parameters, the simple principal have proposed a series of algorithms by synthesizing the component analysis network (PCANet) was built in [25]. advantages of 2DPCA [30] and KPCA [31]. Nhat and Lee 'e convolution layer of CNN was introduced into the [32] proposed the kernel-based 2DPCA, which directly classical feature extraction framework of “Feature Map- extracted nonlinear features from two-dimensional images. Pattern Map-Histogram.” 'e unsupervised hierarchical 'e nonlinear correlation analysis of matrix was realized. features were obtained. 'e high computational com- However, the storage requirement of kernel matrix was plexity caused by iteration and optimization was avoided. higher when training samples were large. Zhou et al. [33] It has been widely used with simple model and rapid calculated the low-rank approximate decomposition of calculation. Since PCA could not extract the nonlinear kernel matrix using Cholesky decomposition method to relationship between images, the kernel principal com- achieve nonlinear feature extraction. 'e computational ponent analysis network (KPCANet) model was estab- efficiency was low in the test stage. Xu et al. [34] used Laplace lished in [26], which achieved better classification results to reduce dimension after the 2DKPCA. Choi et al. [35] than PCANet. In order to remove the redundancy of proposed the incremental 2DKPCA (I2DKPCA), which multiperspective views, our team proposed a binary reduced the calculation speed and improved the perfor- multiview kernel principal component analysis network mance of feature extraction. Zhang et al. [29] built the 2- (BMKPCANet) model [27] for the multiview objects dimensional kernel PCA (2DKPCA) framework. 'e per- recognition. However, the model converted two-dimen- formance of unilateral 2DKPCA (row and column) and that sional image matrix into vector when the features were of bilateral 2DKPCA in face and object recognitions were extracted, and the original image structure was destroyed, compared, respectively. Mohammad et al. [36] matched and the computation was also large, so we improved the historical parameters by bilateral 2DKPCA. Xiang et al. [37] model. Inspired by the two-dimensional principal com- realized dimensionality reduction for hyperspectral images ponent analysis network (2DPCANet) [28] and the two- using the segmented row-column K2DPCA method. In dimensional kernel principal component analysis order to reduce the storage requirement and computational (2DKPCA) [29], the images of fruits boxes were processed complexity of kernel matrix, blockwise methods were by 2DKPCA, and a new multiview feature extraction proposed [38, 39], which transformed the large kernel model was established. 'e main contributions of this matrix into several small kernel matrices and then combined work were summarized as follows: the eigenvectors of small kernel matrices. Wang and Zhou (1) A binary multiview two-dimensional kernel princi- [40] mixed image blocks and vector method. 'e scale of the pal component analysis network (BM2DKPCANet) kernel matrix was decreased by taking several adjacent rows model was built to extract clustering features, which or columns of the graph as a computing unit for non- can reduce data redundancy and realize binary mapping. Chen et al. [41] proposed bidirectional two-di- multiview clustering. mensional kernel quaternion principal component analysis (BD2DKQPCA) for colour image recognition. 'e kernel (2) 'e multiview recognition method of fruits boxes matrix was used to replace the covariance matrix between was proposed combining BM2DKPCANet model samples, which avoided the high-complexity calculation of with the support vector machine (SVM). high-dimensional space. 'en they improved 2DKQPCA by (3) 'e proposed method was compared with the adding blockwise process [42]. According to the charac- PCANet, 2DPCANet, KPCANet, and BMKPCANet teristics of quaternion Hermitian matrix, the blocks of main models on the fruits boxes dataset and ETH-80 and diagonal, next to the main diagonal, and backdiagonal di- COIL-100 public datasets. Taking the recognition rection were analyzed. accuracy and time consumption as the evaluation 'rough the research of the above algorithms, consid- indexes, the experiments showed that the recogni- ering the recognition and computing performance, this tion performance of the proposed method was su- paper sampled images in blocks when extracting the image perior to other methods. features. It had been demonstrated that the recognition 'e rest of this work was organized as follows. 'e performance of the column-oriented algorithm was superior methods based on the 2DPCA and KPCA are introduced in to the row-oriented algorithm by experiments in the pro- Section 2. 'e obtaining method of fruits boxes images from posed B2DKPCA [38], the bidirectional two-dimensional KQPCA (BD2DKQPCA) [41], and the block-based multiview angles is introduced in Section 3. 'e feature ex- traction process of the proposed BM2DKPCANet algorithm 2DKQPCA [42]. So this work adopted column-oriented algorithm to conduct 2DKPCA; that is, the column vector of and the identification process of fruits boxes are also dis- the image sample is mapped to a high-dimensional space cussed in detail in Section 3. 'e experimental process, through the nonlinear mapping function. 'e kernel matrix results, and discussion are shown in Section 4. Finally, the replaced the covariance matrix. research and the future work are summarized in Section 5. 4 Journal of Robotics 3. Materials and Methods 3.1. Experimental Materials Camera 3.1.1. Establishment of Multiview Dataset of Fruits Boxes. 'is work adopted the multiview feature method to collect images. Under the principle of ensuring that the set of projected views is as small as possible and can represent many common attitudes of the boxes, several two-dimen- sional projections with different viewpoints are used to describe the features of the boxes. In order to describe and Projection establish visual model preferably, the relative position re- lation between fixed view and boxes in different positions was transformed into the relation between relative move- ment view and fixed boxes. Various observed postures of the boxes in normal operation were collected under the motion view. Since the opposite sides of the boxes had the same Figure 1: Image acquisition method. pattern generally, multiple semiarc viewpoint projection model was set up, as shown in Figure 1. 'e camera kept moving on the green cambered surface, and the multiple different postures of the boxes are obtained. 'e semiarc viewpoints surface must be divided into small areas to obtain the projection of 3D targets with different attitudes. 'e view areas are reasonably divided and distributed viewpoints to ensure that the projection view set is as small as possible and can represent multiple common attitudes of the boxes. 'e distribution of view- points was described by the representation of latitude and longitude in geography based on the idea of uniform di- vision and morphology diagram method [43] to simulate Figure 2: Distribution of viewpoints. the box postures in the real situation, as shown in Figure 2. 'e projection of the box at each viewpoint corresponds to composed of the 720 images of 20 objects randomly. 'e a two-dimensional image, respectively, and the multiview partial images of the ETH-80 and COIL-100 are shown in projection model of the box was constructed. Figure 5. 'e experimental objects were from the fruit wholesale market of Zhangdian District, Zibo City, Shandong Prov- ince, China. A total of 15 different types for 10 kinds of fruits 3.2. BM2DKPCANet Model Based on 2DKPCA boxes were selected in the experiment, which were defined as 3.2.1. Construction of Feature Extraction Model. Since the apple1, apple2, apple3, watermelon1, watermelon2, orange1, image database is composed of several two-dimensional orange2, cantaloupe1, cantaloupe2, pomegranate, pear, multiview images, the images as much as possible represent durian, coconut, banana, and pineapple, as shown in Fig- the common postures of the boxes, which lead to a lot of data ure 3. Multiview collection was carried out for the boxes of redundancy. In order to reduce unnecessary data storage, each category, which is shown in Figure 4. 200 samples of this paper added clustering step in the feature extraction each category were retained. 'e image size was normalized model of fruits boxes, as shown in Figure 6. According to the to 32 × 32, and gray processing was carried out. related research principal component analysis network, the two-layer 2DKPCA network was constructed. 'e extracted feature vectors were binary clustering coded at the same 3.1.2. Multiview Public Datasets. In order to fully verify the time. 'e clustering feature representation of decimal sys- performance of the proposed multiview recognition al- tem was obtained by block histogram transformation, and gorithm, the recognition performance tests are also car- the clustering feature extraction was completed. ried out on public datasets ETH-80 [44] and COIL-100 [45]. 'e ETH-80 dataset contains 8 species classes. Each species is an image set of 10 different objects, which 3.2.2. BM2DKPCANet Model contains 41 images of each object taken from different angles. 4 objects of each species were randomly selected to (1) First 2DKPCA. 'e image size of database was adjusted to form the training set, and the rest were used as the test set m ×n. As the input layer I , patch sampling was sliding in this paper. 'e COIL-100 dataset contains images of performed by k ×k window. All sample patches were 100 objects. Each object was taken at 72 different angles gathered and cascaded. 'e jth patch of the ith image was within a 360 circumference. 'e training set was defined as x . 'e ith image could be expressed as i,j Journal of Robotics 5 Apple1 Apple2 Apple3 Watermelon1 Watermelon2 Orange1 Orange2 Cantaloupe1 Cantaloupe2 Pomegranate Pear Durian Coconut Banana Pineapple Figure 3: Categories of fruits boxes dataset. After doing the same progress for the other images, the feature analysis based on 2DKPCA was performed on the local feature matrix. Due to not needing explicit form after mapping, and in order to avoid complex calculation in high- dimensional space, the covariance matrix after samples mapping was replaced by kernel matrix [41]. Training m ×n sample matrix I i (i � 1, 2, . . ., S)∈R was converted to k ×mn local eigenmatrix X (i � 1, 2, . . ., S)∈ R after patches sliding sampling. 'e dimension of the column direction kernel matrix for S training samples is Smn ×Smn, which requires a large amount of computation. 'is work adopted average column vectors to replace the original mn column vectors [29]; then the sample of nonlinear mappings for training became X ; that is, Ψ: R ⟶ F, (4) X ⟶ ϕ X􏼁 . 1 mn X � X , (5) i i t�1 mn Figure 4: Collecting images in multiviews. where i � 1, 2, . . ., S, t � 1, 2, . . ., mn. S training samples can be approximately expressed in the kernel feature space as I � x , x , . . . , x , (1) 􏽨 􏽩 i i,1 i,2 follows: i,m 􏽢·􏽢 n Φ � 􏼂ϕ X 􏼁 ,ϕ X 􏼁 , . . . ,ϕ X 􏼁 􏼃. (6) 1 1 2 S where m 􏽢 was the number of patches on rows and n 􏽢 was the number of patches on columns. 'e demean sample patch 'en the kernel matrix can be expressed as was obtained as follows: K � ⟨ϕ ,ϕ ⟩ . (7) � Φ Φ �􏼂 k X , X 􏼁􏼃 􏼂 X 􏼁 X 􏼁 􏼃 1 1 1 s s s s m 􏽢·􏽢 n 􏽐 x j�1 i,j (2) x � x − . 'e dimension was reduced to S ×S, and the compu- i,j i,j m 􏽢 · n 􏽢 tational complexity reduced greatly. 'e kernel matrix K1 'e local feature matrix of the ith image could be written was centralized [46], such that as 1 1 1 k ×mn K � K − K 1 − 1K + 1K 1, (8) (3) 1 1 1 1 1 X � 􏽨x , x , . . ., x 􏽩 ∈ R . 2 i i,1 i,2 i,m 􏽢·􏽢 n S S S 6 Journal of Robotics (a) (b) Figure 5: Partial samples of the public datasets (a) ETH-80 and (b) COIL-100. W i W L L2 1 Patch-mean 2DKPCA filters Patch-mean 2DKPCA filters Binary hashing Blockwise Multiview input layer removal convolution removal convolution and clustering histogram First 2DKPCA Second 2DKPCA Output layer Figure 6: Model of BM2DKPCANet. where 1 was the matrix of order S whose all components (2) Second 2DKPCA. Taking the output of the first 2DKPCA were 1. 'e eigenvectors corresponding to the top L largest as the input of the second 2DKPCA, the same process as the eigenvalues of K were taken as the kernel principal com- first 2DKPCA was repeated. 'e nonlinear high-dimen- ponent filters of the first-layer network. sional mapping of the image matrix was carried out. 'e kernel matrix K was calculated and centralized to K ap- W � w , w , . . . , w . (9) 􏽨 􏽩 proximately. 'e first L kernel principal component fea- l 1 2 L 2 tures of K were used as the filters convolution kernel W of 2 ℓ 'e training sample I after the zero-filled boundary was the second-layer network: convolved with the first-layer 2DKPCA filter, W � 􏽨w , w , . . . , w 􏽩. (11) ℓ 1 2 L l 1 (10) Ι � Ι ∗ W , i � 1, 2, . . . , S; l � 1, 2, . . . , L , i i l 1 Similarly, the output of the first 2DKPCA was further l m×n where ∗ was two-dimensional convolution, Ι ∈ R , and L convoluted, and the output of the second 2DKPCA could be was the filters number of the first 2DKPCA. obtained: n Journal of Robotics 7 l l 2 i 1 2 used binary encoding technology to solve the problem of Ο � Ι ∗ W � ∗ W ∗ W , i i ℓ Ι l ℓ (12) multiview clustering. Binary encoding and clustering for s. t. i � 1, 2, . . . , S; l � 1, 2, . . . , L ;ℓ � 1, 2, . . . , L . 1 2 multiple views were jointly optimized at the same time. 'e problems of big data storage and long time-consuming operation were well improved. It reduced the computation (3) Binary Hash Features Clustering. Similarly, the output of time and storage space greatly. 'e speed and efficiency were the first 2DKPCA was further convoluted, and the output of enhanced. 'e model proposed in this paper encoded and the second 2DKPCA could be obtained: in order to reduce clustered multiview dataset at the same time, and the total the data redundancy caused by the multiangle acquisition optimization function was set as process of the box image, the clustering operation was carried out in this stage. Binary clustering algorithm [47] � � � � � m �2 m m r m l � m� � � � � � � min F U , B, C, G, a􏼁 � 􏽘 a 􏼁 􏼒 B − U ϕ􏼐Ο 􏼑 + β U � � � � m�1 c m m T m l m l 2 (13) · 􏼒− tr􏼐U ϕ􏼐􏼐Ο 􏼑􏼑􏼐U ϕ􏼐Ο 􏼑􏼑 􏼓 + λ‖B − CG‖ T m m q×n q×c c×n s.t. C 1 � 0, 􏽘 α � 1, α > 0, B ∈ {−1, 1} , C ∈ {−1, 1} , G ∈ {0, 1} , 􏽘 g � 1, ji m j where α was the weight of the mth view, m � 1,. . .,M. was nonnegative constant. G � [g , . . . , g ] and λ are the 1 n Different views had different weights. r> 1 was scalar that regularization parameters. controlled the weights. B � [b , . . . , b ], b was collaborative 'e optimization problem was divided into several 1 n i m m binary code of the ith instance, and each encoding B was subproblems. U , B, C, G, and α were optimized and represented by the product of a clustering centroid C and updated alternately by an alternating optimization strategy. indicator vector g. U was mapping matrix of mth view. When some variable was updating, other variables were ϕ(Ο ) was the kernel function based on nonlinear RBF fixed. 'e corresponding optimization cost functions were mapping between the output feature of the second 2DKPCA as follows; then the sample of nonlinear mappings for and selected sample points randomly under the mth view. c training became X ; that is, � � � � m 2 m m � � c T � �2 m m l m m l m l � � � � � � min F U 􏼁 � B − U ϕ􏼐Ο 􏼑 + β�U � − tr􏼒􏼐U ϕ􏼐Ο 􏼑􏼑􏼐U ϕ􏼐Ο 􏼑􏼑 􏼓, (14) � � � � � m �2 m � m l � 2 � � min 􏽘 α 􏼁 􏼒 B − U ϕ􏼐Ο 􏼑 􏼓 + λ‖B − CG‖ � � F m�1 (15) M M r r T m T m m l ⎡ ⎢ ⎤ ⎥ ⎡ ⎢ ⎤ ⎥ ⎢ ⎛ ⎝ ⎞ ⎠ ⎥ ⎢ ⎛ ⎝ ⎞ ⎠⎥ ⎣ ⎦ ⎣ ⎦ � tr B 􏽘 α 􏼁 I + λI B − 2tr B 􏽘 α 􏼁 U ϕ􏼐Ο 􏼑 + λCG + con, m�1 m�1 � � � � 2 2 � � � � 2 T T T � � � � (16) min F(C) � ‖B − CG‖ + ρ C 1 � −2tr B CG + ρ C 1 + con, � � 􏼐 􏼑􏼑 � � p+1 ⎧ ⎨ 1, s � arg min H􏼐b , c 􏼑, p+1 j i g � (17) js 0, otherwise, m 1/(1− r) g 􏼁 α � , (18) m 1/(1−r) 􏽐 g 􏼁 where con is the constant with respect to B. H is the distance (4) Output of the Block Histogram Features. L features were from each B to the cluster center. Until the total optimi- outputted for each input I in second 2DKPCA, whose zation function was optimal, the binary hash clustering binary cell vector was clustered and optimized as a whole. optimization was completed. Each optimized feature was converted to decimal, 8 Journal of Robotics Linear, Polynomial, PolyPlus, Gaussian, and Sigmoid kernel m m l l− 1 l T � 􏽘 2 􏼐h 􏼑, (19) functions. 'eir corresponding expressions are as follows: i i l�1 K􏼐υ , υ 􏼑 � υ υ + c, (22) i j i j where T , l ∈ [1, L ], and each pixel was an integer within L − 1 [0, 2 ]. Z blocks of each T were counted by histogram i T (23) l K􏼐υ , υ 􏼑 � 􏼐υ υ 􏼑 , i j i j Zhist(T ). A vector can be obtained by connecting Zhist(T ), (24) K􏼐υ , υ 􏼑 � 􏽨􏼐υ υ 􏼑 + 1􏽩 , i j i m j m L L 2 m 1 2 L Z 1 ( ) (20) f � 􏼔Zhist􏼐T 􏼑, · · ·, Zhist􏼒T 􏼓􏼕 ∈ R , i i i � � � �2 � � � � υ − υ m � i j� ⎜ ⎟ where f is the BM2DKPCANet feature of the ith sample ⎛ ⎜ ⎞ ⎟ ⎝ ⎠ (25) i K􏼐υ , υ 􏼑 � exp − , i j under the mth view. 2σ K􏼐υ , υ 􏼑 � tanh􏼐αυ υ + c􏼑, (26) 3.2.3. Fruits Boxes Recognition Based on BM2DKPCANET i j i j Model. 'e fruits boxes features extracted by where c, d,σ,α are all real constants. 'is work defined that BM2DKPCANet model were input into the classifier for mn c � 0, d � 3, σ � 1, and α � 1/2. υ ,υ ∈ R is the row vector of i j training recognition. 'e performance of classifier deter- the matrix to be transformed. 'e influence of kernel mines the recognition accuracy and classification speed function on model performance in the same parameters directly. Support vector machine (SVM) is widely used in the environment was studied, as shown in Figure 8. It can be field of pattern recognition because of its outstanding ad- seen that Gaussian kernel function adopted in the model can vantages in solving small sample nonlinear high-dimen- achieve the best recognition effect. sional pattern recognition [48, 49]. 'is work also used SVM as classifier. According to previous studies [27], the radial basis function (RBF) was selected as kernel function, which 4.2. Influence of Filter Parameters mapped the features into the high-dimensional space to find the optimal hyperplane. Correct recognition of different 4.2.1. Influence of Number of Filters. 'e patch size, block kinds of fruits boxes achieved. 'e specific identification size, and overlapping ratio were set as 5 × 5, 8 × 8, and 0.5, process of fruits boxes is shown in Figure 7. respectively. 'e influence of the number of filters on the performance of the model was studied, as shown in Figure 9. 4. Results and Discussion 'e blue line represents the accuracy on the fruits boxes dataset when the number of filters of the first 2DKPCA was 'e experiment was performed by Matlab2017b and Python selected within range from 2 to 14. 'e accuracy tended to be integrated environment Anaconda3 on the Intel(R) Xeon(R) stable when L ≥ 8. 'e selection of the second 2DKPCA CPU E5-1650 v4@3.6 GHz, 64 GB RAM, NVIDIA GeForce filter was conducted with L � 8. 'e red line represents the GTX 10808G GPU platforms. 'e classifier kernel param- accuracy on the fruits boxes dataset when the number of eters were selected by grid search method and cross-vali- filters of the second 2DKPCA was selected within range from dation method based on the LibSVM software package. 'e 2 to 14. 'e accuracy is levelling off when L ≥ 6. L � 8, L � 6 2 1 2 penalty parameters C � 58 and c � 2 were determined. 'e would be set in the following experiment. following experiment analyzed influence of parameters on model performance taking the average accuracy after 10 tests as the evaluation index. 'e recognition accuracy was used 4.2.2. Influence of Patch Size and Block Size. Since PCA filter as the evaluation metric: has the conditions of k k ≥ L , L , the minimum patch size 1 2 1 2 was set to 3 × 3. In order to observe the influence of patch 􏽐 ϕ Z , map c􏼁􏼁 i�1 i i (21) accuracy � , size and block size on the recognition in this proposed model, the block sizes were defined as 4 × 4, 8 × 8, and 16 × 16. 'e maximum patch size was set to 13 × 13. 'e where n is the total number of images in the dataset, g is the accuracy with different patch size and different block size in ground-truth of images, and map(c ) is the classification the fruits boxes dataset is shown in Figure 10. It can be predicted by the algorithm. If Z � map(c ), then i i obtained that the accuracy tends to increase as the block size ϕ(Z , map(c )) � 1; otherwise ϕ(Z , map(c )) � 0. i i i i increases. Whereas larger block size will lose part of the features of the first-layer network [25], the block size is set to 4.1. Influence of Kernel Function. KPCA is a nonlinear ex- 5 × 5, and the block size is set to 8 × 8 in this paper. tension of principal component analysis using kernel technique. 'e selection of kernel function is related to the extraction of nonlinear features of dataset and affects the 4.2.3. Influence of Overlapping Ratio. It has been verified performance of model recognition directly. 'is paper that overlapping blocks not only improve target detection studied the influence of commonly used kernel functions on accuracy [50] but also resist geometric rotation and scaling the performance of the BM2DKPCANet model, such as changes to some extent. 'e robustness is also enhanced Journal of Robotics 9 Features input Training set Testing set Calculate kernel function and hashing vector Image preprocessing Image preprocessing Initial encoding Clustering The first 2DKPCA The first KPCA features extraction Define total encoding clustering loss function The second 2DKPCA The second KPCA Fix other quantities and update U Hashing and clustering of binary features Hashing and clustering of binary features Fix other quantities and update B Block histogram clustering feature output Block histogram clustering feature output Fix other quantities and update C and G Training and optimizing classifier Classifier recognize Fix other quantities and update a Recognition results output Obtain optimal total loss function Clustering features output Figure 7: Flowchart of fruits boxes identification. 92.8 100 92.6 92.4 92.2 91.8 91.6 91.4 Linear Polynomial PolyPlus Gaussian Sigmoid Kernel function 246 81 10 12 4 Number of filters Figure 8: Performance comparison of kernel functions. BM2DKPCANet-1 BM2DKPCANet-2 [51]. In order to strengthen the spatial information of fruits boxes, overlapping partitioning was carried out in this paper. Figure 9: Boxes recognition accuracy with different number of filters. 'e overlapping ratio of blocks was set from 0.1 to 0.9, respectively. 'e influence of overlapping ratio on fruits boxes recognition with the optimal other parameters was robot. For apple3, watermelon1, orange2, cantaloupe2, pear, shown in Figure 11. It can be seen that when the block and pineapple, etc., the top and side surfaces are easily to be overlapping ratio is 0.6, the recognition performance of the confused, which lead to the lower accuracy. 'e average model is optimal. accuracy is 92.89%, which increased by 2.09% compared to the BMKPCANet [27] model. 'is model was compared with PCANet [24], 4.3. Analysis of Experimental Results. In order to verify the recognition ability of the proposed algorithm for fruit 2DPCANet [28], KPCANet [26], and BMKPCANet [27] in terms of recognition rate and time normalization to verify packing boxes, 80 images of each type of packing boxes were randomly selected as training samples, and the other 120 the performance of BM2DKPCANet model, as shown in Figure 12. It can be seen that although the proposed images were taken as test samples. 'e experiment was done with the optimal parameters. 'e recognition accuracies of BM2DKPCANet model has more time consumption than the PCANet-related model, the recognition rate is 11.84% different categories were shown in Table 1. 'e overall higher than the average of the PCANet-related model and recognition meets the requirements of fruits boxes handling Accuracy (%) Accuracy (%) 10 Journal of Robotics 3 × 3 5 × 5 7 × 7 9 × 9 11 × 11 13 × 13 Patch size 8 × 8 16 × 16 4 × 4 Figure 10: Influence of sampling block and histogram block on fruits boxes recognition. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Overlapping ratio Figure 11: Influence of block overlap ratio on fruits boxes recognition. Table 1: Recognition accuracy of different categories. that into account, BM2DKPCANET model is better than other models in fruits boxes recognition. Categories Right recognition number Accuracy (%) 'e recognition experiments were conducted with the apple1 115 95.83 same model parameters to verify the proposed multiview apple2 117 97.5 recognition algorithm. 'e comparisons of model perfor- apple3 109 90.83 mance on ETH-80 and COIL-100 are shown in Figures 13 watermelon1 108 90 and 14, respectively. It is easy to find that the proposed watermelon2 111 92.5 BM2DKPCANET model can achieve a higher recognition orange1 116 96.67 orange2 109 90.83 accuracy compared with other models. cantaloupe1 113 94.17 It can be proved that the BM2DKPCANET model has cantaloupe2 107 89.17 achieved a good recognition effect for the three datasets in pomegranate 118 98.33 the part of the experimental results. Compared with PCANet pear 106 88.33 and 2DPCANet models, the proposed model adopts kernel durian 111 92.5 principal component analysis method, which makes the coconut 112 93.33 features to a high-dimensional space by nonlinear mapping banana 111 92.5 and then carries out PCA dimensionality reduction. 'e pineapple 109 90.83 nonlinear relationship of images is extracted, whereas the calculation is more complex and takes more time than PCA. 2.485% higher than the average of the KPCANet-related 'e recognition accuracy is greatly improved. Compared model. In addition, the time consumption of with KPCA in KPCANet and BMKPCANet models, the BM2DKPCANet can be saved by 24.5% on average. Taking proposed model does not need to transform two- Accuracy (%) Accuracy (%) Journal of Robotics 11 95 1.4 1.2 0.8 0.6 0.4 0.2 70 0 PCANet 2DPCANet KPCANet BMKPCANet BM2DKPCANet Models Accuracy Normalization time Figure 12: Comparison of model performance on the fruits boxes dataset. 100 1.4 1.2 0.8 0.6 0.4 0.2 70 0 PCANet 2DPCANet KPCANet BMKPCANet BM2DKPCANet Models Accuracy Normalization time Figure 13: Comparison of model performance on the ETH-80 dataset. 100 1.4 1.2 0.8 0.6 0.4 0.2 70 0 PCANet 2DPCANet KPCANet BMKPCANet BM2DKPCANet Models Accuracy Normalization time Figure 14: Comparison of model performance on the COIL-100 dataset. Accuracy (%) Accuracy (%) Accuracy (%) Normalization time Normalization time Normalization time 12 Journal of Robotics [4] M. H. Li, Object recognition and tracking based on structural dimensional matrix into one-dimensional vector but directly sparse representation, Ph.D. thesis, University of Electronic takes the average column vector method based on 2-di- Science and Technology of China, Chengdu, China, 2020. mensional image, which not only does not destroy the [5] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature structural information of the original image as much as hierarchies for accurate object detection and semantic seg- possible, but also greatly reduces the complexity. 'erefore, mentation,” in Proceedings of the IEEE Computer Society not only is the recognition rate higher, but it also improves Conference on Computer Vision and Pattern Recognition the efficiency. (ECCV), pp. 580–587, Columbus, OH, USA, June 2014. [6] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE In- 5. Conclusions ternational Conference on Computer Vision (ICCV), pp. 1440–1448, Santiago, Chile, December 2015. In order to reduce the labour intensity of fruits handling in [7] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards fruits orchards and fruits markets, this paper studied the real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelli- fruits boxes recognition based on the machine vision. 'e gence, vol. 39, no. 6, pp. 1137–1149, 2017. recognition of 3D boxes was transformed into the feature [8] M. Bucolo, L. Fortuna, M. Nelke, A. Rizzo, and T. Sciacca, extraction of 2D images. For the sake of the original 2D “Prediction models for the corrosion phenomena in pulp & images’ structures, the established BM2DKPCANet model paper plant,” Control Engineering Practice, vol. 10, no. 2, performed two-layer 2DKPCA analysis on the 2D images. pp. 227–237, 2002. Binary clustering algorithm was added in the feature ex- [9] W. Liu, D. Anguelov, D. Erhan et al., “SSD: single shot traction stage to reduce the data redundancy caused by the multibox detector,” in Proceedings of the European Conference multiview acquisition. 'e multiview recognition method on Computer Vision (ECCV), pp. 21–37, Amsterdam, Neth- for fruits boxes was proposed by combining erlands, October 2016. BM2DKPCANet model with SVM classifier based on RBF. [10] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only 'e experimental results showed that the recognition ac- look once: unified, real-time object detection,” in Proceedings curacy of this method is 11.84% higher than the average of of the IEEE Conference on Computer Vision and Pattern PCANet model and 2.485% higher than the average of Recognition (CVPR), pp. 779–788, Las Vegas, NV, USA, June KPCANet model, which can meet the requirements of au- [11] J. Redmon and A. Farhadi, “YOLO9000: better, faster, tomatic rapid identification of fruits boxes handling. It laid a stronger,” in Proceedings of the IEEE Conference on Computer foundation for realizing the intelligent mechanization of Vision and Pattern Recognition (CVPR), pp. 7263–7271, fruits boxes handling and reducing the labour intensity of Honolulu, HI, USA, July 2017. fruit farmers. [12] R. Joseph and F. Ali, “YOLOv3: an incremental improve- ment,” 2018, https://arxiv.org/abs/1804.02767v1. Data Availability [13] A. Bochkovskiy, C. Y. Wang, and H. Y. Liao, “YOLOv4: optimal speed and accuracy of object detection,” 2020, https:// 'e dataset presented in this study are available on request arxiv.org/abs/2004.10934. from the corresponding author. [14] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi- view convolutional neural networks for 3d shape recogni- tion,” in Proceedings of the IEEE International Conference on Conflicts of Interest Computer Vision (ICCV), pp. 945–953, Washington, DC, USA, December 2015. 'e authors declare that there are no conflicts of interest [15] Z. Gao, D. Wang, X. He, and H. Zhang, “Group-pair con- regarding the publication of this study. volutional neural networks for multi-view based 3D object retrieval,” in Proceeding of the Firty-Second AAAI Conference Acknowledgments on Artificial Intelligence, pp. 1–8, New Orleans, LA, USA, February 2018. 'e authors are grateful to workers at Zibo wholesale fruits [16] Z. Gao, D. Y. Wang, Y. B. Xue, G. P. Xu, H. Zhang, and market. 'is research was funded by the National Natural Y. L. Wang, “3D object recognition based on pairwise multi- Science Foundation of China (Grant no. 52075306). view convolutional neural networks,” Journal of Visual Communication and Image Representation, vol. 56, no. 10, pp. 305–315, 2018. References [17] Z. Gao, H. Xue, and S. Wan, “Multiple discrimination and pairwise CNN for view-based 3D object retrieval,” Neural [1] H. Wang, R. Li, Y. Gao, C. Cao, L. Ge, and X. Xie, “Target Networks, vol. 125, no. 2, pp. 290–302, 2020. recognition and localization of mobile robot with monocular PTZ camera,” Journal of Robotics, vol. 2019, pp. 1–12, Article [18] H. Li, Y. Zheng, J. Cao, and Q. Cai, “Multi-view-based siamese convolutional neural network for 3D object retrieval,” ID 8789725, 2019. [2] J. Yuan, “Research progress analysis of robotics selective Computers & Electrical Engineering, vol. 78, no. 7, pp. 11–21, harvesting technologies,” Transactions of the Chinese Society for Agricultural Machinery, vol. 51, no. 9, pp. 1–17, 2020. [19] C. Lin and A. Kumar, “Contactless and partial 3D fingerprint recognition using multi-view deep representation,” Pattern [3] H. Q. T. Ngo, V. N. S. Huynh, T. P. Nguyen, and H. Nguyen, “Sustainable agriculture: stable robust control in presence of Recognition, vol. 83, pp. 314–327, 2018. [20] Y. Yang, F. Chen, F. Wu, D. Zeng, Y.-M. Ji, and X.-Y. Jing, uncertainties for multi-functional indoor transportation of farm products,” Agriculture, vol. 10, no. 11, pp. 523–618, 2020. “Multi-view semantic learning network for point cloud based Journal of Robotics 13 3D object detection,” Neurocomputing, vol. 397, no. 3, [37] Y. Xiang, G. Yang, J. Zhang, and Q. Wang, “Dimensionality pp. 477–485, 2020. reduction for hyperspectral image using a segmented row- [21] Y. Zhang, X. S. Guo, X. Guo, H. Ren, and L. Li, “Multi-view column kernel two-dimensional principal component anal- classification with semi-supervised learning for SAR target ysis method,” Infrared Technology, vol. 39, no. 12, pp. 1107– 1113, 2017. recognition,” Signal Processing, vol. 183, Article ID 108030, [38] N. Sun, H.-x. Wang, Z.-h. Ji, C.-r. Zou, and L. Zhao, “An efficient algorithm for kernel two-dimensional principal [22] X. Chen, Y. Chen, and H. Najjaran, “End-to-end 3D object component analysis,” Neural Computing and Applications, model retrieval by projecting the point cloud onto a unique vol. 17, no. 1, pp. 59–64, 2007. discriminating 2D view,” Neurocomputing, vol. 402, [39] A. Eftekhari, M. Forouzanfar, H. Abrishami Moghaddam, and pp. 336–345, 2020. J. Alirezaie, “Block-wise 2D kernel PCA/LDA for face rec- [23] M. Bucolo, A. Buscarino, C. Famoso, L. Fortuna, and ognition,” Information Processing Letters, vol. 110, no. 17, M. Frasca, “Control of imperfect dynamical systems,” Non- pp. 761–766, 2010. linear Dynamics, vol. 98, no. 4, pp. 2989–2999, 2019. [40] L. Wang and X. Zhou, “Approximation kernel 2DPCA by [24] L. Fei, B. Zhang, J. Wen, S. Teng, S. Li, and D. Zhang, “Jointly mixture of vector and matrix representation,” International learning compact multi-view hash codes for few-shot FKP Conference on Computational Intelligence and Security, recognition,” Pattern Recognition, vol. 115, Article ID 107894, vol. 12, no. 3, pp. 1298–1302, 2011. [41] B. Chen, J. Yang, B. Jeon, and X. Zhang, “Kernel quaternion [25] T.-H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, “PCANet: principal component analysis and its application in RGB-D a simple deep learning baseline for image classification?” IEEE object recognition,” Neurocomputing, vol. 266, pp. 293–303, Transactions on Image Processing, vol. 24, no. 12, pp. 5017– 5032, 2015. [42] B. J. Chen, J. H. Yang, C. N. Fan, Q. T. Su, and D. C. Wang, [26] D. Wu, J. S. Wu, R. Zeng, L. Y. Jiang, L. Senhadji, and “Block-wise two dimensional kernel quaternion principal H. Z. Shu, “Kernel principal component analysis network for component analysis,” Journal of Beijing University of Posts and image classification,” Journal of Southeast University(English Telecommunications, vol. 42, no. 1, pp. 53–60, 2019. Edition), vol. 31, no. 4, pp. 469–473, 2015. [43] H. Peng, S. K. Huang, L. Tao, and T. X. Zhang, “Multi-view [27] X. N. Li, H. Wu, and X. H. Yang, “Multi-view recognition of modeling of 3D target based on Zernike moments,” Infrared fruit packing boxes based on features clustering angle,” High and Laser Engineering, vol. 34, no. 3, pp. 292–296, 2005. Technology Letters, vol. 47, 2021. [44] B. Leibe and B. Schiele, “Analyzing appearance and contour [28] D. Yu and X.-J. Wu, “2DPCANet: a deep leaning network for based methods for object categorization,” in Proceedings of the face recognition,” Multimedia Tools and Applications, vol. 77, 2003 IEEE Computer Society Conference on Computer Vision no. 10, pp. 12919–12934, 2018. and Pattern Recognition, pp. 409–415, Madison, WI, USA, [29] D. Zhang, S. Chen, and Z. Zhou, “Recognizing face or object June 2003. from a single image: linear vs. kernel methods on patterns,” in [45] S. A. Nene, S. K. Nayar, and H. Murase, Columbia Object Proceedings of the Joint IAPR International Workshops on Image Library (COIL 20), CUCS-005-96, New York: De- Structural and Syntactic Pattern Recognition and Statistical partment of Computer Science, New York, NY, USA, 1996. Techniques in Pattern Recognition, LNCS 4109, pp. 889–897, [46] J. R. Hu and Z. Z. Yu, “Model validation method with Hong Kong, China, August 2006. multivariate output based on kernel principal component [30] J. Jian Yang, D. Zhang, A. F. Frangi, and J. Y. Jing-yu Yang, analysis,” Journal of Beijing University of Aeronautics and “Two-dimensional pca: a new approach to appearance-based Astronautics, vol. 43, no. 7, pp. 1470–1480, 2017. face representation and recognition,” IEEE Transactions on [47] Z. Zhang, L. Liu, F. Shen, H. T. Shen, and L. Shao, “Binary Pattern Analysis and Machine Intelligence, vol. 26, no. 1, multi-view clustering,” IEEE Transactions on Pattern Analysis pp. 131–137, 2004. and Machine Intelligence, vol. 41, no. 7, pp. 1774–1782, 2019. [48] X. Zhang, Z. Yang, and S. Cao, “Spectral detection method for [31] B. Scholkopf, ¨ A. Smola, and K.-R. Muller, ¨ “Nonlinear com- chilling damage of sweet potato based on support vector ponent analysis as a kernel eigenvalue problem,” Neural machine,” Transactions of the Chinese Society for Agricultural Computation, vol. 10, no. 5, pp. 1299–1319, 1998. Machinery, vol. 51, no. 2, pp. 471–477, 2020. [32] V. D. M. Nhat and S. Lee, “Kernel-based 2DPCA for face [49] H. Zhang, H. Sun, and P. Shi, “Chip appearance inspection recognition,” in Proceedings of the 2007 IEEE International method for high-precision SMT equipment,” Machines, vol. 9, Symposium on Signal Processing and Information Technology, no. 2, p. 34, 2021. pp. 35–39, Giza, Egypt, December 2007. [50] R. H. Miao, H. Yang, J. L. Wu, and H. Y. Liu, “Weed [33] S. Zhou, Y. Zheng, and X. Mu, “K2DPCA methods for face identification of overlapping spinach leaves based on image recognition based on Cholesky decomposition,” Systems sub-block and reconstruction,” Transactions of the Chinese Engineering-Feory&Practice, vol. 36, no. 2, pp. 528–535, Society of Agricultural Engineering (Transactions of the CSAE), vol. 36, no. 4, pp. 178–184, 2020. [34] M. Xu, D. Xu, and M. Wei, “Face recognition with Laplacian [51] Y. Wang, X. Kang, and Y. Chen, “Robust and accurate de- eigenmaps based on 2D-KPCA,” Application Research of tection of image copy-move forgery using PCET-SVD and Computers, vol. 34, no. 7, pp. 2212–2215, 2017. histogram of block similarity measures,” Journal of Infor- [35] Y. Choi, S. Ozawa, and M. Lee, “Incremental two-dimensional mation Security and Applications, vol. 54, no. 10, Article ID kernel principal component analysis,” Neurocomputing, 102536, 2020. vol. 134, pp. 280–288, 2014. [36] M. Esmaeili, M. Ahmadi, and A. Kazemi, “Kernel-based two- dimensional principal component analysis applied for pa- rameterization in history matching,” Journal of Petroleum Science and Engineering, vol. 191, Article ID 107134, 2020. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Robotics Hindawi Publishing Corporation

Multiview Machine Vision Research of Fruits Boxes Handling Robot Based on the Improved 2D Kernel Principal Component Analysis Network

Loading next page...
 
/lp/hindawi-publishing-corporation/multiview-machine-vision-research-of-fruits-boxes-handling-robot-based-eHhPRQY5jV

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 2021 Xinning Li et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN
1687-9600
eISSN
1687-9619
DOI
10.1155/2021/3584422
Publisher site
See Article on Publisher Site

Abstract

Hindawi Journal of Robotics Volume 2021, Article ID 3584422, 13 pages https://doi.org/10.1155/2021/3584422 Research Article Multiview Machine Vision Research of Fruits Boxes Handling Robot Based on the Improved 2D Kernel Principal Component Analysis Network 1 1 1 2 2 Xinning Li , Hu Wu, Xianhai Yang , Peng Xue, and Shuai Tan School of Mechanical Engineering, Shandong University of Technology, Zibo 255000, China National Engineering Research Center for Production Equipment, Dongying 257091, China Correspondence should be addressed to Xianhai Yang; yxh@sdut.edu.cn Received 1 May 2021; Accepted 21 June 2021; Published 8 July 2021 Academic Editor: L. Fortuna Copyright © 2021 Xinning Li et al. 'is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In order to better realize the orchard intelligent mechanization and reduce the labour intensity of workers, the study of intelligent fruit boxes handling robot is necessary. 'e first condition to realize intelligence is the fruit boxes recognition, which is the research content of this paper. 'e method of multiview two-dimensional (2D) recognition was adopted. A multiview dataset for fruits boxes was built. For the sake of the structure of the original image, the model of binary multiview 2D kernel principal component analysis network (BM2DKPCANet) was established to reduce the data redundancy and increase the correlation between the views. 'e method of multiview recognition for the fruits boxes was proposed combining BM2DKPCANet with the support vector machine (SVM) classifier. 'e performance was verified by comparing with principal component analysis network (PCANet), 2D principal component analysis network (2DPCANet), kernel principal component analysis network (KPCANet), and binary multiview kernel principal component analysis network (BMKPCANet) in terms of recognition rate and time consumption. 'e experimental results show that the recognition rate of the method is 11.84% higher than the mean value of PCANet though it needs more time. Compared with the mean value of KPCANet, the recognition rate exceeded 2.485%, and the time saved was 24.5%. 'e model can meet the requirements of fruits boxes handling robot. precision agriculture. In order to realize intelligent handling, 1. Introduction this paper studied the fruits boxes recognition based on As the primary industry of the national economy, agriculture machine vision. is the primary condition for all production, and the proposal According to the different modelling methods of target of precision agriculture has put forward higher require- appearance, the research results of target recognition in ments. 'e fruit industry, as a labour-intensive industry, has recent years have been divided into three categories [4]: a large demand for labour and low work efficiency. 'e based on feature invariants, representation learning, and deep learning. 'e view models based on feature invariants automation and mechanization industry chain needs up- grade urgently [1]. With the rapid development of the ar- extract the features of multiple images from different per- tificial intelligence, the fruit recognition and fruit picking spectives and then train the classifier, which are used for the have always being studied [2]. 'ere are relatively few occasions with a small number of training samples. 'e studies on fruit handling [3]. On farms and in wholesale fruit research studies mainly focus on the construction of artificial markets, the handling of fruits boxes is still dominated by features and classification algorithms, and many outstanding manual labour, which is time-consuming and labour-con- works have emerged. Due to the necessity to study the suming. In the new era, the cost of manual labour is in- characteristic invariance of the target in advance, candidate creasing year by year, which cannot meet the demand of features have characters such as weak adaptability, weak 2 Journal of Robotics efficiency of feature extraction because it was not end-to- generalization ability, and large application limitation. It has the large feature description vector dimension and high end network. 'e end-to-end group-pair convolutional neural network (GPCNN) was established in [15]. 'e training cost of the classifier. Researchers proposed the methods based on subspace learning to solve the problems, small-scale problem could be solved. 'e novel pairwise which transformed high-dimensional feature vectors into multiview convolutional neural network (PMV-CNN) was low-dimensional ones. 'e classifiers were trained in the proposed in [16], which focused on complementary in- subspace. 'e typical representative methods are as follows: formation between views. 'e feature extraction and target principal component analysis (PCA) based on unsupervised recognition are unified into CNN. It could improve the learning and linear discriminant analysis (LDA) based on robustness of feature extraction obviously when the number of training samples was small. In order to make up supervised learning. Based on these, the methods with low data dimension, strong noise processing ability, and high for the disadvantages caused by random images selection in multiview recognition, a multiview discrimination and efficiency were put forward, such as robust PCA (RPCA), inductive RPCA (IRPCA), kernel PCA (KPCA), two-di- pairwise convolutional neural network (MDPCNN) was obtained by adding the Slice layer and the Concat layer in mensional PCA (2DPCA), and discriminative low-rank and sparse principal feature coding (D-LSPFC). With the [17]. 'e model was verified that it had good intraclass emergence of a large number of public image datasets, the compactness and interclass separability. 'e multiview- target recognition methods based on deep learning have based Siamese convolutional neural network was exploited been studied more and more. 'e models based on con- in [18]. An end-to-end multiview 3D fingerprint learning volutional neural network (CNN) promoted the develop- model was proposed in [19], which included full con- ment of computer vision in particular by virtue of its strong volutional network and three Siamese networks. 'e multiview generator module was used in [20] to project the nonlinear feature expression ability and good generalization performance. Region CNN (R-CNN) [5] applied deep 3D point cloud to the plane at a specific angle. On the premise of retaining the underlying features, spatial learning in the target recognition for the first time. And then deep convolutional neural networks Fast R-CNN [6] and refusion operation was adopted to realize the interaction between different projections, and the features were Faster R-CNN [7] were proposed by combining the training and testing process, which improved the identification ac- reconstructed for target recognition. Based on the semi- curacy and efficiency greatly. As the product of integrating supervised learning and expectation maximization, a fuzzy logic reasoning and self-learning ability of neural multiview fusion strategy classification method with the network, neurofuzzy network has also been widely used [8]. ability of label propagation was proposed in [21]. An end- 'e CNN-based single shot detector (SSD) [9] and the to-end cloud convolutional neural network was built in YOLO [10] deep learning object detection method further [22] based on the projection network mechanism. 'e point cloud was projected into a two-dimensional view improved a new height in real-time effect. On this basis, the proposed YOLOv2 [11] and YOLO V3 [12] gradually im- with rich discriminant features, and the robustness and accuracy had been improved significantly. Multiview proved the running speed and robustness, and the detection performance had been significantly improved. YOLO V4 features projections were coded as binary in [23]. 'e [13] achieved double improvement in speed and accuracy, recognition descriptors were assembled block statistical which took CSPDarkNet53 + SPP + PANet (path-aggrega- features. Although the above methods have achieved good tion-neck) + yolov3-head as the model. It is undeniable that results, the models based on the convolutional neural the effect of the target recognition algorithm based on deep network (CNN) also have some problems, such as com- learning is remarkable, and the recognition accuracy is much plicated structure, long training cycle. 'ey do not seem to higher than the traditional manual methods and the rep- be the best choice for fruits boxes handling robots. From resentation learning methods. However, it cannot be ignored the above research, it can be concluded that considering the relationship between multiple views can improve the that the target recognition still has great challenges in some occasions, such as target overlap, partial occlusion, high recognition accuracy and robustness, and the binary coding method can improve the operation efficiency of the similarity, complex environment, and strong interference. 'e methods with complex models, long training time, and model, which also become the research factors of the new high requirements for hardware computing power have model developed in this paper. affected the application in mobile robots. No system is perfect. 'e hidden state of the inevitable As a three-dimensional (3D) object, the direct ex- uncertainty in the system can be stimulated, and the traction and recognition of 3D features for fruit box lead to connection between these uncertainties and the object complex calculation and high operation storage. A view- system can be established to improve the system perfor- mance [24]. Although fruit packing boxes are generally based method is adopted in this paper, that is, 3D objects are represented through multiple views. As a common regular cubes, traditional rule-based feature extraction and recognition methods cannot achieve better results because method of 3D objects recognition, multiview learning model and recognition method have also received more of the variety of fruits and the influence of surface patterns, colours, and surrounding environment. 'erefore, deep attention. 'e multiview-based convolutional neural network (MVCNN) was built in [14]. 'e maximum learning algorithm is more advantageous. 'e current pooling layers blended the multiple views features. 'e deep learning target recognition algorithm is an end-to- MVCNN model had low convergence speed and low end solution; that is, it is completed in one step from the Journal of Robotics 3 input image to the output task result, but it is completed in 2. Related Works stages internally as image feature extraction network In order to reduce the sample dimension and obtain the classification and regression. Aiming at the long training nonlinear correlation between multiple pixels, some scholars time of the classic CNN parameters, the simple principal have proposed a series of algorithms by synthesizing the component analysis network (PCANet) was built in [25]. advantages of 2DPCA [30] and KPCA [31]. Nhat and Lee 'e convolution layer of CNN was introduced into the [32] proposed the kernel-based 2DPCA, which directly classical feature extraction framework of “Feature Map- extracted nonlinear features from two-dimensional images. Pattern Map-Histogram.” 'e unsupervised hierarchical 'e nonlinear correlation analysis of matrix was realized. features were obtained. 'e high computational com- However, the storage requirement of kernel matrix was plexity caused by iteration and optimization was avoided. higher when training samples were large. Zhou et al. [33] It has been widely used with simple model and rapid calculated the low-rank approximate decomposition of calculation. Since PCA could not extract the nonlinear kernel matrix using Cholesky decomposition method to relationship between images, the kernel principal com- achieve nonlinear feature extraction. 'e computational ponent analysis network (KPCANet) model was estab- efficiency was low in the test stage. Xu et al. [34] used Laplace lished in [26], which achieved better classification results to reduce dimension after the 2DKPCA. Choi et al. [35] than PCANet. In order to remove the redundancy of proposed the incremental 2DKPCA (I2DKPCA), which multiperspective views, our team proposed a binary reduced the calculation speed and improved the perfor- multiview kernel principal component analysis network mance of feature extraction. Zhang et al. [29] built the 2- (BMKPCANet) model [27] for the multiview objects dimensional kernel PCA (2DKPCA) framework. 'e per- recognition. However, the model converted two-dimen- formance of unilateral 2DKPCA (row and column) and that sional image matrix into vector when the features were of bilateral 2DKPCA in face and object recognitions were extracted, and the original image structure was destroyed, compared, respectively. Mohammad et al. [36] matched and the computation was also large, so we improved the historical parameters by bilateral 2DKPCA. Xiang et al. [37] model. Inspired by the two-dimensional principal com- realized dimensionality reduction for hyperspectral images ponent analysis network (2DPCANet) [28] and the two- using the segmented row-column K2DPCA method. In dimensional kernel principal component analysis order to reduce the storage requirement and computational (2DKPCA) [29], the images of fruits boxes were processed complexity of kernel matrix, blockwise methods were by 2DKPCA, and a new multiview feature extraction proposed [38, 39], which transformed the large kernel model was established. 'e main contributions of this matrix into several small kernel matrices and then combined work were summarized as follows: the eigenvectors of small kernel matrices. Wang and Zhou (1) A binary multiview two-dimensional kernel princi- [40] mixed image blocks and vector method. 'e scale of the pal component analysis network (BM2DKPCANet) kernel matrix was decreased by taking several adjacent rows model was built to extract clustering features, which or columns of the graph as a computing unit for non- can reduce data redundancy and realize binary mapping. Chen et al. [41] proposed bidirectional two-di- multiview clustering. mensional kernel quaternion principal component analysis (BD2DKQPCA) for colour image recognition. 'e kernel (2) 'e multiview recognition method of fruits boxes matrix was used to replace the covariance matrix between was proposed combining BM2DKPCANet model samples, which avoided the high-complexity calculation of with the support vector machine (SVM). high-dimensional space. 'en they improved 2DKQPCA by (3) 'e proposed method was compared with the adding blockwise process [42]. According to the charac- PCANet, 2DPCANet, KPCANet, and BMKPCANet teristics of quaternion Hermitian matrix, the blocks of main models on the fruits boxes dataset and ETH-80 and diagonal, next to the main diagonal, and backdiagonal di- COIL-100 public datasets. Taking the recognition rection were analyzed. accuracy and time consumption as the evaluation 'rough the research of the above algorithms, consid- indexes, the experiments showed that the recogni- ering the recognition and computing performance, this tion performance of the proposed method was su- paper sampled images in blocks when extracting the image perior to other methods. features. It had been demonstrated that the recognition 'e rest of this work was organized as follows. 'e performance of the column-oriented algorithm was superior methods based on the 2DPCA and KPCA are introduced in to the row-oriented algorithm by experiments in the pro- Section 2. 'e obtaining method of fruits boxes images from posed B2DKPCA [38], the bidirectional two-dimensional KQPCA (BD2DKQPCA) [41], and the block-based multiview angles is introduced in Section 3. 'e feature ex- traction process of the proposed BM2DKPCANet algorithm 2DKQPCA [42]. So this work adopted column-oriented algorithm to conduct 2DKPCA; that is, the column vector of and the identification process of fruits boxes are also dis- the image sample is mapped to a high-dimensional space cussed in detail in Section 3. 'e experimental process, through the nonlinear mapping function. 'e kernel matrix results, and discussion are shown in Section 4. Finally, the replaced the covariance matrix. research and the future work are summarized in Section 5. 4 Journal of Robotics 3. Materials and Methods 3.1. Experimental Materials Camera 3.1.1. Establishment of Multiview Dataset of Fruits Boxes. 'is work adopted the multiview feature method to collect images. Under the principle of ensuring that the set of projected views is as small as possible and can represent many common attitudes of the boxes, several two-dimen- sional projections with different viewpoints are used to describe the features of the boxes. In order to describe and Projection establish visual model preferably, the relative position re- lation between fixed view and boxes in different positions was transformed into the relation between relative move- ment view and fixed boxes. Various observed postures of the boxes in normal operation were collected under the motion view. Since the opposite sides of the boxes had the same Figure 1: Image acquisition method. pattern generally, multiple semiarc viewpoint projection model was set up, as shown in Figure 1. 'e camera kept moving on the green cambered surface, and the multiple different postures of the boxes are obtained. 'e semiarc viewpoints surface must be divided into small areas to obtain the projection of 3D targets with different attitudes. 'e view areas are reasonably divided and distributed viewpoints to ensure that the projection view set is as small as possible and can represent multiple common attitudes of the boxes. 'e distribution of view- points was described by the representation of latitude and longitude in geography based on the idea of uniform di- vision and morphology diagram method [43] to simulate Figure 2: Distribution of viewpoints. the box postures in the real situation, as shown in Figure 2. 'e projection of the box at each viewpoint corresponds to composed of the 720 images of 20 objects randomly. 'e a two-dimensional image, respectively, and the multiview partial images of the ETH-80 and COIL-100 are shown in projection model of the box was constructed. Figure 5. 'e experimental objects were from the fruit wholesale market of Zhangdian District, Zibo City, Shandong Prov- ince, China. A total of 15 different types for 10 kinds of fruits 3.2. BM2DKPCANet Model Based on 2DKPCA boxes were selected in the experiment, which were defined as 3.2.1. Construction of Feature Extraction Model. Since the apple1, apple2, apple3, watermelon1, watermelon2, orange1, image database is composed of several two-dimensional orange2, cantaloupe1, cantaloupe2, pomegranate, pear, multiview images, the images as much as possible represent durian, coconut, banana, and pineapple, as shown in Fig- the common postures of the boxes, which lead to a lot of data ure 3. Multiview collection was carried out for the boxes of redundancy. In order to reduce unnecessary data storage, each category, which is shown in Figure 4. 200 samples of this paper added clustering step in the feature extraction each category were retained. 'e image size was normalized model of fruits boxes, as shown in Figure 6. According to the to 32 × 32, and gray processing was carried out. related research principal component analysis network, the two-layer 2DKPCA network was constructed. 'e extracted feature vectors were binary clustering coded at the same 3.1.2. Multiview Public Datasets. In order to fully verify the time. 'e clustering feature representation of decimal sys- performance of the proposed multiview recognition al- tem was obtained by block histogram transformation, and gorithm, the recognition performance tests are also car- the clustering feature extraction was completed. ried out on public datasets ETH-80 [44] and COIL-100 [45]. 'e ETH-80 dataset contains 8 species classes. Each species is an image set of 10 different objects, which 3.2.2. BM2DKPCANet Model contains 41 images of each object taken from different angles. 4 objects of each species were randomly selected to (1) First 2DKPCA. 'e image size of database was adjusted to form the training set, and the rest were used as the test set m ×n. As the input layer I , patch sampling was sliding in this paper. 'e COIL-100 dataset contains images of performed by k ×k window. All sample patches were 100 objects. Each object was taken at 72 different angles gathered and cascaded. 'e jth patch of the ith image was within a 360 circumference. 'e training set was defined as x . 'e ith image could be expressed as i,j Journal of Robotics 5 Apple1 Apple2 Apple3 Watermelon1 Watermelon2 Orange1 Orange2 Cantaloupe1 Cantaloupe2 Pomegranate Pear Durian Coconut Banana Pineapple Figure 3: Categories of fruits boxes dataset. After doing the same progress for the other images, the feature analysis based on 2DKPCA was performed on the local feature matrix. Due to not needing explicit form after mapping, and in order to avoid complex calculation in high- dimensional space, the covariance matrix after samples mapping was replaced by kernel matrix [41]. Training m ×n sample matrix I i (i � 1, 2, . . ., S)∈R was converted to k ×mn local eigenmatrix X (i � 1, 2, . . ., S)∈ R after patches sliding sampling. 'e dimension of the column direction kernel matrix for S training samples is Smn ×Smn, which requires a large amount of computation. 'is work adopted average column vectors to replace the original mn column vectors [29]; then the sample of nonlinear mappings for training became X ; that is, Ψ: R ⟶ F, (4) X ⟶ ϕ X􏼁 . 1 mn X � X , (5) i i t�1 mn Figure 4: Collecting images in multiviews. where i � 1, 2, . . ., S, t � 1, 2, . . ., mn. S training samples can be approximately expressed in the kernel feature space as I � x , x , . . . , x , (1) 􏽨 􏽩 i i,1 i,2 follows: i,m 􏽢·􏽢 n Φ � 􏼂ϕ X 􏼁 ,ϕ X 􏼁 , . . . ,ϕ X 􏼁 􏼃. (6) 1 1 2 S where m 􏽢 was the number of patches on rows and n 􏽢 was the number of patches on columns. 'e demean sample patch 'en the kernel matrix can be expressed as was obtained as follows: K � ⟨ϕ ,ϕ ⟩ . (7) � Φ Φ �􏼂 k X , X 􏼁􏼃 􏼂 X 􏼁 X 􏼁 􏼃 1 1 1 s s s s m 􏽢·􏽢 n 􏽐 x j�1 i,j (2) x � x − . 'e dimension was reduced to S ×S, and the compu- i,j i,j m 􏽢 · n 􏽢 tational complexity reduced greatly. 'e kernel matrix K1 'e local feature matrix of the ith image could be written was centralized [46], such that as 1 1 1 k ×mn K � K − K 1 − 1K + 1K 1, (8) (3) 1 1 1 1 1 X � 􏽨x , x , . . ., x 􏽩 ∈ R . 2 i i,1 i,2 i,m 􏽢·􏽢 n S S S 6 Journal of Robotics (a) (b) Figure 5: Partial samples of the public datasets (a) ETH-80 and (b) COIL-100. W i W L L2 1 Patch-mean 2DKPCA filters Patch-mean 2DKPCA filters Binary hashing Blockwise Multiview input layer removal convolution removal convolution and clustering histogram First 2DKPCA Second 2DKPCA Output layer Figure 6: Model of BM2DKPCANet. where 1 was the matrix of order S whose all components (2) Second 2DKPCA. Taking the output of the first 2DKPCA were 1. 'e eigenvectors corresponding to the top L largest as the input of the second 2DKPCA, the same process as the eigenvalues of K were taken as the kernel principal com- first 2DKPCA was repeated. 'e nonlinear high-dimen- ponent filters of the first-layer network. sional mapping of the image matrix was carried out. 'e kernel matrix K was calculated and centralized to K ap- W � w , w , . . . , w . (9) 􏽨 􏽩 proximately. 'e first L kernel principal component fea- l 1 2 L 2 tures of K were used as the filters convolution kernel W of 2 ℓ 'e training sample I after the zero-filled boundary was the second-layer network: convolved with the first-layer 2DKPCA filter, W � 􏽨w , w , . . . , w 􏽩. (11) ℓ 1 2 L l 1 (10) Ι � Ι ∗ W , i � 1, 2, . . . , S; l � 1, 2, . . . , L , i i l 1 Similarly, the output of the first 2DKPCA was further l m×n where ∗ was two-dimensional convolution, Ι ∈ R , and L convoluted, and the output of the second 2DKPCA could be was the filters number of the first 2DKPCA. obtained: n Journal of Robotics 7 l l 2 i 1 2 used binary encoding technology to solve the problem of Ο � Ι ∗ W � ∗ W ∗ W , i i ℓ Ι l ℓ (12) multiview clustering. Binary encoding and clustering for s. t. i � 1, 2, . . . , S; l � 1, 2, . . . , L ;ℓ � 1, 2, . . . , L . 1 2 multiple views were jointly optimized at the same time. 'e problems of big data storage and long time-consuming operation were well improved. It reduced the computation (3) Binary Hash Features Clustering. Similarly, the output of time and storage space greatly. 'e speed and efficiency were the first 2DKPCA was further convoluted, and the output of enhanced. 'e model proposed in this paper encoded and the second 2DKPCA could be obtained: in order to reduce clustered multiview dataset at the same time, and the total the data redundancy caused by the multiangle acquisition optimization function was set as process of the box image, the clustering operation was carried out in this stage. Binary clustering algorithm [47] � � � � � m �2 m m r m l � m� � � � � � � min F U , B, C, G, a􏼁 � 􏽘 a 􏼁 􏼒 B − U ϕ􏼐Ο 􏼑 + β U � � � � m�1 c m m T m l m l 2 (13) · 􏼒− tr􏼐U ϕ􏼐􏼐Ο 􏼑􏼑􏼐U ϕ􏼐Ο 􏼑􏼑 􏼓 + λ‖B − CG‖ T m m q×n q×c c×n s.t. C 1 � 0, 􏽘 α � 1, α > 0, B ∈ {−1, 1} , C ∈ {−1, 1} , G ∈ {0, 1} , 􏽘 g � 1, ji m j where α was the weight of the mth view, m � 1,. . .,M. was nonnegative constant. G � [g , . . . , g ] and λ are the 1 n Different views had different weights. r> 1 was scalar that regularization parameters. controlled the weights. B � [b , . . . , b ], b was collaborative 'e optimization problem was divided into several 1 n i m m binary code of the ith instance, and each encoding B was subproblems. U , B, C, G, and α were optimized and represented by the product of a clustering centroid C and updated alternately by an alternating optimization strategy. indicator vector g. U was mapping matrix of mth view. When some variable was updating, other variables were ϕ(Ο ) was the kernel function based on nonlinear RBF fixed. 'e corresponding optimization cost functions were mapping between the output feature of the second 2DKPCA as follows; then the sample of nonlinear mappings for and selected sample points randomly under the mth view. c training became X ; that is, � � � � m 2 m m � � c T � �2 m m l m m l m l � � � � � � min F U 􏼁 � B − U ϕ􏼐Ο 􏼑 + β�U � − tr􏼒􏼐U ϕ􏼐Ο 􏼑􏼑􏼐U ϕ􏼐Ο 􏼑􏼑 􏼓, (14) � � � � � m �2 m � m l � 2 � � min 􏽘 α 􏼁 􏼒 B − U ϕ􏼐Ο 􏼑 􏼓 + λ‖B − CG‖ � � F m�1 (15) M M r r T m T m m l ⎡ ⎢ ⎤ ⎥ ⎡ ⎢ ⎤ ⎥ ⎢ ⎛ ⎝ ⎞ ⎠ ⎥ ⎢ ⎛ ⎝ ⎞ ⎠⎥ ⎣ ⎦ ⎣ ⎦ � tr B 􏽘 α 􏼁 I + λI B − 2tr B 􏽘 α 􏼁 U ϕ􏼐Ο 􏼑 + λCG + con, m�1 m�1 � � � � 2 2 � � � � 2 T T T � � � � (16) min F(C) � ‖B − CG‖ + ρ C 1 � −2tr B CG + ρ C 1 + con, � � 􏼐 􏼑􏼑 � � p+1 ⎧ ⎨ 1, s � arg min H􏼐b , c 􏼑, p+1 j i g � (17) js 0, otherwise, m 1/(1− r) g 􏼁 α � , (18) m 1/(1−r) 􏽐 g 􏼁 where con is the constant with respect to B. H is the distance (4) Output of the Block Histogram Features. L features were from each B to the cluster center. Until the total optimi- outputted for each input I in second 2DKPCA, whose zation function was optimal, the binary hash clustering binary cell vector was clustered and optimized as a whole. optimization was completed. Each optimized feature was converted to decimal, 8 Journal of Robotics Linear, Polynomial, PolyPlus, Gaussian, and Sigmoid kernel m m l l− 1 l T � 􏽘 2 􏼐h 􏼑, (19) functions. 'eir corresponding expressions are as follows: i i l�1 K􏼐υ , υ 􏼑 � υ υ + c, (22) i j i j where T , l ∈ [1, L ], and each pixel was an integer within L − 1 [0, 2 ]. Z blocks of each T were counted by histogram i T (23) l K􏼐υ , υ 􏼑 � 􏼐υ υ 􏼑 , i j i j Zhist(T ). A vector can be obtained by connecting Zhist(T ), (24) K􏼐υ , υ 􏼑 � 􏽨􏼐υ υ 􏼑 + 1􏽩 , i j i m j m L L 2 m 1 2 L Z 1 ( ) (20) f � 􏼔Zhist􏼐T 􏼑, · · ·, Zhist􏼒T 􏼓􏼕 ∈ R , i i i � � � �2 � � � � υ − υ m � i j� ⎜ ⎟ where f is the BM2DKPCANet feature of the ith sample ⎛ ⎜ ⎞ ⎟ ⎝ ⎠ (25) i K􏼐υ , υ 􏼑 � exp − , i j under the mth view. 2σ K􏼐υ , υ 􏼑 � tanh􏼐αυ υ + c􏼑, (26) 3.2.3. Fruits Boxes Recognition Based on BM2DKPCANET i j i j Model. 'e fruits boxes features extracted by where c, d,σ,α are all real constants. 'is work defined that BM2DKPCANet model were input into the classifier for mn c � 0, d � 3, σ � 1, and α � 1/2. υ ,υ ∈ R is the row vector of i j training recognition. 'e performance of classifier deter- the matrix to be transformed. 'e influence of kernel mines the recognition accuracy and classification speed function on model performance in the same parameters directly. Support vector machine (SVM) is widely used in the environment was studied, as shown in Figure 8. It can be field of pattern recognition because of its outstanding ad- seen that Gaussian kernel function adopted in the model can vantages in solving small sample nonlinear high-dimen- achieve the best recognition effect. sional pattern recognition [48, 49]. 'is work also used SVM as classifier. According to previous studies [27], the radial basis function (RBF) was selected as kernel function, which 4.2. Influence of Filter Parameters mapped the features into the high-dimensional space to find the optimal hyperplane. Correct recognition of different 4.2.1. Influence of Number of Filters. 'e patch size, block kinds of fruits boxes achieved. 'e specific identification size, and overlapping ratio were set as 5 × 5, 8 × 8, and 0.5, process of fruits boxes is shown in Figure 7. respectively. 'e influence of the number of filters on the performance of the model was studied, as shown in Figure 9. 4. Results and Discussion 'e blue line represents the accuracy on the fruits boxes dataset when the number of filters of the first 2DKPCA was 'e experiment was performed by Matlab2017b and Python selected within range from 2 to 14. 'e accuracy tended to be integrated environment Anaconda3 on the Intel(R) Xeon(R) stable when L ≥ 8. 'e selection of the second 2DKPCA CPU E5-1650 v4@3.6 GHz, 64 GB RAM, NVIDIA GeForce filter was conducted with L � 8. 'e red line represents the GTX 10808G GPU platforms. 'e classifier kernel param- accuracy on the fruits boxes dataset when the number of eters were selected by grid search method and cross-vali- filters of the second 2DKPCA was selected within range from dation method based on the LibSVM software package. 'e 2 to 14. 'e accuracy is levelling off when L ≥ 6. L � 8, L � 6 2 1 2 penalty parameters C � 58 and c � 2 were determined. 'e would be set in the following experiment. following experiment analyzed influence of parameters on model performance taking the average accuracy after 10 tests as the evaluation index. 'e recognition accuracy was used 4.2.2. Influence of Patch Size and Block Size. Since PCA filter as the evaluation metric: has the conditions of k k ≥ L , L , the minimum patch size 1 2 1 2 was set to 3 × 3. In order to observe the influence of patch 􏽐 ϕ Z , map c􏼁􏼁 i�1 i i (21) accuracy � , size and block size on the recognition in this proposed model, the block sizes were defined as 4 × 4, 8 × 8, and 16 × 16. 'e maximum patch size was set to 13 × 13. 'e where n is the total number of images in the dataset, g is the accuracy with different patch size and different block size in ground-truth of images, and map(c ) is the classification the fruits boxes dataset is shown in Figure 10. It can be predicted by the algorithm. If Z � map(c ), then i i obtained that the accuracy tends to increase as the block size ϕ(Z , map(c )) � 1; otherwise ϕ(Z , map(c )) � 0. i i i i increases. Whereas larger block size will lose part of the features of the first-layer network [25], the block size is set to 4.1. Influence of Kernel Function. KPCA is a nonlinear ex- 5 × 5, and the block size is set to 8 × 8 in this paper. tension of principal component analysis using kernel technique. 'e selection of kernel function is related to the extraction of nonlinear features of dataset and affects the 4.2.3. Influence of Overlapping Ratio. It has been verified performance of model recognition directly. 'is paper that overlapping blocks not only improve target detection studied the influence of commonly used kernel functions on accuracy [50] but also resist geometric rotation and scaling the performance of the BM2DKPCANet model, such as changes to some extent. 'e robustness is also enhanced Journal of Robotics 9 Features input Training set Testing set Calculate kernel function and hashing vector Image preprocessing Image preprocessing Initial encoding Clustering The first 2DKPCA The first KPCA features extraction Define total encoding clustering loss function The second 2DKPCA The second KPCA Fix other quantities and update U Hashing and clustering of binary features Hashing and clustering of binary features Fix other quantities and update B Block histogram clustering feature output Block histogram clustering feature output Fix other quantities and update C and G Training and optimizing classifier Classifier recognize Fix other quantities and update a Recognition results output Obtain optimal total loss function Clustering features output Figure 7: Flowchart of fruits boxes identification. 92.8 100 92.6 92.4 92.2 91.8 91.6 91.4 Linear Polynomial PolyPlus Gaussian Sigmoid Kernel function 246 81 10 12 4 Number of filters Figure 8: Performance comparison of kernel functions. BM2DKPCANet-1 BM2DKPCANet-2 [51]. In order to strengthen the spatial information of fruits boxes, overlapping partitioning was carried out in this paper. Figure 9: Boxes recognition accuracy with different number of filters. 'e overlapping ratio of blocks was set from 0.1 to 0.9, respectively. 'e influence of overlapping ratio on fruits boxes recognition with the optimal other parameters was robot. For apple3, watermelon1, orange2, cantaloupe2, pear, shown in Figure 11. It can be seen that when the block and pineapple, etc., the top and side surfaces are easily to be overlapping ratio is 0.6, the recognition performance of the confused, which lead to the lower accuracy. 'e average model is optimal. accuracy is 92.89%, which increased by 2.09% compared to the BMKPCANet [27] model. 'is model was compared with PCANet [24], 4.3. Analysis of Experimental Results. In order to verify the recognition ability of the proposed algorithm for fruit 2DPCANet [28], KPCANet [26], and BMKPCANet [27] in terms of recognition rate and time normalization to verify packing boxes, 80 images of each type of packing boxes were randomly selected as training samples, and the other 120 the performance of BM2DKPCANet model, as shown in Figure 12. It can be seen that although the proposed images were taken as test samples. 'e experiment was done with the optimal parameters. 'e recognition accuracies of BM2DKPCANet model has more time consumption than the PCANet-related model, the recognition rate is 11.84% different categories were shown in Table 1. 'e overall higher than the average of the PCANet-related model and recognition meets the requirements of fruits boxes handling Accuracy (%) Accuracy (%) 10 Journal of Robotics 3 × 3 5 × 5 7 × 7 9 × 9 11 × 11 13 × 13 Patch size 8 × 8 16 × 16 4 × 4 Figure 10: Influence of sampling block and histogram block on fruits boxes recognition. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Overlapping ratio Figure 11: Influence of block overlap ratio on fruits boxes recognition. Table 1: Recognition accuracy of different categories. that into account, BM2DKPCANET model is better than other models in fruits boxes recognition. Categories Right recognition number Accuracy (%) 'e recognition experiments were conducted with the apple1 115 95.83 same model parameters to verify the proposed multiview apple2 117 97.5 recognition algorithm. 'e comparisons of model perfor- apple3 109 90.83 mance on ETH-80 and COIL-100 are shown in Figures 13 watermelon1 108 90 and 14, respectively. It is easy to find that the proposed watermelon2 111 92.5 BM2DKPCANET model can achieve a higher recognition orange1 116 96.67 orange2 109 90.83 accuracy compared with other models. cantaloupe1 113 94.17 It can be proved that the BM2DKPCANET model has cantaloupe2 107 89.17 achieved a good recognition effect for the three datasets in pomegranate 118 98.33 the part of the experimental results. Compared with PCANet pear 106 88.33 and 2DPCANet models, the proposed model adopts kernel durian 111 92.5 principal component analysis method, which makes the coconut 112 93.33 features to a high-dimensional space by nonlinear mapping banana 111 92.5 and then carries out PCA dimensionality reduction. 'e pineapple 109 90.83 nonlinear relationship of images is extracted, whereas the calculation is more complex and takes more time than PCA. 2.485% higher than the average of the KPCANet-related 'e recognition accuracy is greatly improved. Compared model. In addition, the time consumption of with KPCA in KPCANet and BMKPCANet models, the BM2DKPCANet can be saved by 24.5% on average. Taking proposed model does not need to transform two- Accuracy (%) Accuracy (%) Journal of Robotics 11 95 1.4 1.2 0.8 0.6 0.4 0.2 70 0 PCANet 2DPCANet KPCANet BMKPCANet BM2DKPCANet Models Accuracy Normalization time Figure 12: Comparison of model performance on the fruits boxes dataset. 100 1.4 1.2 0.8 0.6 0.4 0.2 70 0 PCANet 2DPCANet KPCANet BMKPCANet BM2DKPCANet Models Accuracy Normalization time Figure 13: Comparison of model performance on the ETH-80 dataset. 100 1.4 1.2 0.8 0.6 0.4 0.2 70 0 PCANet 2DPCANet KPCANet BMKPCANet BM2DKPCANet Models Accuracy Normalization time Figure 14: Comparison of model performance on the COIL-100 dataset. Accuracy (%) Accuracy (%) Accuracy (%) Normalization time Normalization time Normalization time 12 Journal of Robotics [4] M. H. Li, Object recognition and tracking based on structural dimensional matrix into one-dimensional vector but directly sparse representation, Ph.D. thesis, University of Electronic takes the average column vector method based on 2-di- Science and Technology of China, Chengdu, China, 2020. mensional image, which not only does not destroy the [5] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature structural information of the original image as much as hierarchies for accurate object detection and semantic seg- possible, but also greatly reduces the complexity. 'erefore, mentation,” in Proceedings of the IEEE Computer Society not only is the recognition rate higher, but it also improves Conference on Computer Vision and Pattern Recognition the efficiency. (ECCV), pp. 580–587, Columbus, OH, USA, June 2014. [6] R. Girshick, “Fast R-CNN,” in Proceedings of the IEEE In- 5. Conclusions ternational Conference on Computer Vision (ICCV), pp. 1440–1448, Santiago, Chile, December 2015. In order to reduce the labour intensity of fruits handling in [7] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards fruits orchards and fruits markets, this paper studied the real-time object detection with region proposal networks,” IEEE Transactions on Pattern Analysis and Machine Intelli- fruits boxes recognition based on the machine vision. 'e gence, vol. 39, no. 6, pp. 1137–1149, 2017. recognition of 3D boxes was transformed into the feature [8] M. Bucolo, L. Fortuna, M. Nelke, A. Rizzo, and T. Sciacca, extraction of 2D images. For the sake of the original 2D “Prediction models for the corrosion phenomena in pulp & images’ structures, the established BM2DKPCANet model paper plant,” Control Engineering Practice, vol. 10, no. 2, performed two-layer 2DKPCA analysis on the 2D images. pp. 227–237, 2002. Binary clustering algorithm was added in the feature ex- [9] W. Liu, D. Anguelov, D. Erhan et al., “SSD: single shot traction stage to reduce the data redundancy caused by the multibox detector,” in Proceedings of the European Conference multiview acquisition. 'e multiview recognition method on Computer Vision (ECCV), pp. 21–37, Amsterdam, Neth- for fruits boxes was proposed by combining erlands, October 2016. BM2DKPCANet model with SVM classifier based on RBF. [10] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only 'e experimental results showed that the recognition ac- look once: unified, real-time object detection,” in Proceedings curacy of this method is 11.84% higher than the average of of the IEEE Conference on Computer Vision and Pattern PCANet model and 2.485% higher than the average of Recognition (CVPR), pp. 779–788, Las Vegas, NV, USA, June KPCANet model, which can meet the requirements of au- [11] J. Redmon and A. Farhadi, “YOLO9000: better, faster, tomatic rapid identification of fruits boxes handling. It laid a stronger,” in Proceedings of the IEEE Conference on Computer foundation for realizing the intelligent mechanization of Vision and Pattern Recognition (CVPR), pp. 7263–7271, fruits boxes handling and reducing the labour intensity of Honolulu, HI, USA, July 2017. fruit farmers. [12] R. Joseph and F. Ali, “YOLOv3: an incremental improve- ment,” 2018, https://arxiv.org/abs/1804.02767v1. Data Availability [13] A. Bochkovskiy, C. Y. Wang, and H. Y. Liao, “YOLOv4: optimal speed and accuracy of object detection,” 2020, https:// 'e dataset presented in this study are available on request arxiv.org/abs/2004.10934. from the corresponding author. [14] H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi- view convolutional neural networks for 3d shape recogni- tion,” in Proceedings of the IEEE International Conference on Conflicts of Interest Computer Vision (ICCV), pp. 945–953, Washington, DC, USA, December 2015. 'e authors declare that there are no conflicts of interest [15] Z. Gao, D. Wang, X. He, and H. Zhang, “Group-pair con- regarding the publication of this study. volutional neural networks for multi-view based 3D object retrieval,” in Proceeding of the Firty-Second AAAI Conference Acknowledgments on Artificial Intelligence, pp. 1–8, New Orleans, LA, USA, February 2018. 'e authors are grateful to workers at Zibo wholesale fruits [16] Z. Gao, D. Y. Wang, Y. B. Xue, G. P. Xu, H. Zhang, and market. 'is research was funded by the National Natural Y. L. Wang, “3D object recognition based on pairwise multi- Science Foundation of China (Grant no. 52075306). view convolutional neural networks,” Journal of Visual Communication and Image Representation, vol. 56, no. 10, pp. 305–315, 2018. References [17] Z. Gao, H. Xue, and S. Wan, “Multiple discrimination and pairwise CNN for view-based 3D object retrieval,” Neural [1] H. Wang, R. Li, Y. Gao, C. Cao, L. Ge, and X. Xie, “Target Networks, vol. 125, no. 2, pp. 290–302, 2020. recognition and localization of mobile robot with monocular PTZ camera,” Journal of Robotics, vol. 2019, pp. 1–12, Article [18] H. Li, Y. Zheng, J. Cao, and Q. Cai, “Multi-view-based siamese convolutional neural network for 3D object retrieval,” ID 8789725, 2019. [2] J. Yuan, “Research progress analysis of robotics selective Computers & Electrical Engineering, vol. 78, no. 7, pp. 11–21, harvesting technologies,” Transactions of the Chinese Society for Agricultural Machinery, vol. 51, no. 9, pp. 1–17, 2020. [19] C. Lin and A. Kumar, “Contactless and partial 3D fingerprint recognition using multi-view deep representation,” Pattern [3] H. Q. T. Ngo, V. N. S. Huynh, T. P. Nguyen, and H. Nguyen, “Sustainable agriculture: stable robust control in presence of Recognition, vol. 83, pp. 314–327, 2018. [20] Y. Yang, F. Chen, F. Wu, D. Zeng, Y.-M. Ji, and X.-Y. Jing, uncertainties for multi-functional indoor transportation of farm products,” Agriculture, vol. 10, no. 11, pp. 523–618, 2020. “Multi-view semantic learning network for point cloud based Journal of Robotics 13 3D object detection,” Neurocomputing, vol. 397, no. 3, [37] Y. Xiang, G. Yang, J. Zhang, and Q. Wang, “Dimensionality pp. 477–485, 2020. reduction for hyperspectral image using a segmented row- [21] Y. Zhang, X. S. Guo, X. Guo, H. Ren, and L. Li, “Multi-view column kernel two-dimensional principal component anal- classification with semi-supervised learning for SAR target ysis method,” Infrared Technology, vol. 39, no. 12, pp. 1107– 1113, 2017. recognition,” Signal Processing, vol. 183, Article ID 108030, [38] N. Sun, H.-x. Wang, Z.-h. Ji, C.-r. Zou, and L. Zhao, “An efficient algorithm for kernel two-dimensional principal [22] X. Chen, Y. Chen, and H. Najjaran, “End-to-end 3D object component analysis,” Neural Computing and Applications, model retrieval by projecting the point cloud onto a unique vol. 17, no. 1, pp. 59–64, 2007. discriminating 2D view,” Neurocomputing, vol. 402, [39] A. Eftekhari, M. Forouzanfar, H. Abrishami Moghaddam, and pp. 336–345, 2020. J. Alirezaie, “Block-wise 2D kernel PCA/LDA for face rec- [23] M. Bucolo, A. Buscarino, C. Famoso, L. Fortuna, and ognition,” Information Processing Letters, vol. 110, no. 17, M. Frasca, “Control of imperfect dynamical systems,” Non- pp. 761–766, 2010. linear Dynamics, vol. 98, no. 4, pp. 2989–2999, 2019. [40] L. Wang and X. Zhou, “Approximation kernel 2DPCA by [24] L. Fei, B. Zhang, J. Wen, S. Teng, S. Li, and D. Zhang, “Jointly mixture of vector and matrix representation,” International learning compact multi-view hash codes for few-shot FKP Conference on Computational Intelligence and Security, recognition,” Pattern Recognition, vol. 115, Article ID 107894, vol. 12, no. 3, pp. 1298–1302, 2011. [41] B. Chen, J. Yang, B. Jeon, and X. Zhang, “Kernel quaternion [25] T.-H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, and Y. Ma, “PCANet: principal component analysis and its application in RGB-D a simple deep learning baseline for image classification?” IEEE object recognition,” Neurocomputing, vol. 266, pp. 293–303, Transactions on Image Processing, vol. 24, no. 12, pp. 5017– 5032, 2015. [42] B. J. Chen, J. H. Yang, C. N. Fan, Q. T. Su, and D. C. Wang, [26] D. Wu, J. S. Wu, R. Zeng, L. Y. Jiang, L. Senhadji, and “Block-wise two dimensional kernel quaternion principal H. Z. Shu, “Kernel principal component analysis network for component analysis,” Journal of Beijing University of Posts and image classification,” Journal of Southeast University(English Telecommunications, vol. 42, no. 1, pp. 53–60, 2019. Edition), vol. 31, no. 4, pp. 469–473, 2015. [43] H. Peng, S. K. Huang, L. Tao, and T. X. Zhang, “Multi-view [27] X. N. Li, H. Wu, and X. H. Yang, “Multi-view recognition of modeling of 3D target based on Zernike moments,” Infrared fruit packing boxes based on features clustering angle,” High and Laser Engineering, vol. 34, no. 3, pp. 292–296, 2005. Technology Letters, vol. 47, 2021. [44] B. Leibe and B. Schiele, “Analyzing appearance and contour [28] D. Yu and X.-J. Wu, “2DPCANet: a deep leaning network for based methods for object categorization,” in Proceedings of the face recognition,” Multimedia Tools and Applications, vol. 77, 2003 IEEE Computer Society Conference on Computer Vision no. 10, pp. 12919–12934, 2018. and Pattern Recognition, pp. 409–415, Madison, WI, USA, [29] D. Zhang, S. Chen, and Z. Zhou, “Recognizing face or object June 2003. from a single image: linear vs. kernel methods on patterns,” in [45] S. A. Nene, S. K. Nayar, and H. Murase, Columbia Object Proceedings of the Joint IAPR International Workshops on Image Library (COIL 20), CUCS-005-96, New York: De- Structural and Syntactic Pattern Recognition and Statistical partment of Computer Science, New York, NY, USA, 1996. Techniques in Pattern Recognition, LNCS 4109, pp. 889–897, [46] J. R. Hu and Z. Z. Yu, “Model validation method with Hong Kong, China, August 2006. multivariate output based on kernel principal component [30] J. Jian Yang, D. Zhang, A. F. Frangi, and J. Y. Jing-yu Yang, analysis,” Journal of Beijing University of Aeronautics and “Two-dimensional pca: a new approach to appearance-based Astronautics, vol. 43, no. 7, pp. 1470–1480, 2017. face representation and recognition,” IEEE Transactions on [47] Z. Zhang, L. Liu, F. Shen, H. T. Shen, and L. Shao, “Binary Pattern Analysis and Machine Intelligence, vol. 26, no. 1, multi-view clustering,” IEEE Transactions on Pattern Analysis pp. 131–137, 2004. and Machine Intelligence, vol. 41, no. 7, pp. 1774–1782, 2019. [48] X. Zhang, Z. Yang, and S. Cao, “Spectral detection method for [31] B. Scholkopf, ¨ A. Smola, and K.-R. Muller, ¨ “Nonlinear com- chilling damage of sweet potato based on support vector ponent analysis as a kernel eigenvalue problem,” Neural machine,” Transactions of the Chinese Society for Agricultural Computation, vol. 10, no. 5, pp. 1299–1319, 1998. Machinery, vol. 51, no. 2, pp. 471–477, 2020. [32] V. D. M. Nhat and S. Lee, “Kernel-based 2DPCA for face [49] H. Zhang, H. Sun, and P. Shi, “Chip appearance inspection recognition,” in Proceedings of the 2007 IEEE International method for high-precision SMT equipment,” Machines, vol. 9, Symposium on Signal Processing and Information Technology, no. 2, p. 34, 2021. pp. 35–39, Giza, Egypt, December 2007. [50] R. H. Miao, H. Yang, J. L. Wu, and H. Y. Liu, “Weed [33] S. Zhou, Y. Zheng, and X. Mu, “K2DPCA methods for face identification of overlapping spinach leaves based on image recognition based on Cholesky decomposition,” Systems sub-block and reconstruction,” Transactions of the Chinese Engineering-Feory&Practice, vol. 36, no. 2, pp. 528–535, Society of Agricultural Engineering (Transactions of the CSAE), vol. 36, no. 4, pp. 178–184, 2020. [34] M. Xu, D. Xu, and M. Wei, “Face recognition with Laplacian [51] Y. Wang, X. Kang, and Y. Chen, “Robust and accurate de- eigenmaps based on 2D-KPCA,” Application Research of tection of image copy-move forgery using PCET-SVD and Computers, vol. 34, no. 7, pp. 2212–2215, 2017. histogram of block similarity measures,” Journal of Infor- [35] Y. Choi, S. Ozawa, and M. Lee, “Incremental two-dimensional mation Security and Applications, vol. 54, no. 10, Article ID kernel principal component analysis,” Neurocomputing, 102536, 2020. vol. 134, pp. 280–288, 2014. [36] M. Esmaeili, M. Ahmadi, and A. Kazemi, “Kernel-based two- dimensional principal component analysis applied for pa- rameterization in history matching,” Journal of Petroleum Science and Engineering, vol. 191, Article ID 107134, 2020.

Journal

Journal of RoboticsHindawi Publishing Corporation

Published: Jul 8, 2021

References