Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

Collaborative filtering recommendation algorithm based on user correlation and evolutionary clustering

Collaborative filtering recommendation algorithm based on user correlation and evolutionary... In recent years, application of recommendation algorithm in real life such as Amazon, Taobao is getting universal, but it is not perfect yet. A few problems need to be solved such as sparse data and low recommended accuracy. Collaborative filtering is a mature algorithm in the recommended systems, but there are still some problems. In this paper, a novel collaborative filtering recommendation algorithm based on user correlation and evolutionary clustering is presented. Firstly, score matrix is pre-processed with normalization and dimension reduction, to obtain denser score data. Based on these processed data, clustering principle is generated and dynamic evolutionary clustering is implemented. Secondly, the search for the nearest neighbors with highest similar interest is considered. A measurement about the relationship between users is proposed, called user correlation, which applies the satisfaction of users and the potential information. In each user group, user correlation is applied to choose the nearest neighbors to predict ratings. The proposed method is evaluated using the Movielens dataset. Diversity experimental results demonstrate that the proposed method has outstanding performance in predicted accuracy and recommended precision. Keywords Collaborative filtering · User correlation · Dynamic evolutionary clustering · Rating prediction · Recommendation Introduction rithms [2], such as neighborhood-based CF(Collaborative Filtering) and model-based CF. The development of the internet and e-commerce makes our Neighborhood-based CF algorithms are further classified life more convenient as billions of required products are into two categories: user-based CF [3] and item-based CF [4]. searchable online. Meanwhile, we must face the problem of And the basic principle of them is interlinked. For instance, information overload in daily life. Under the circumstances, user-based CF considers two users to be similar when their it is much harder for us to dig out relevant object that we neighbors are similar. Obviously, selection of nearest neigh- really want than ever before [1]. Many researchers have bor is significant. Correspondingly, choosing the appropriate done plenty of research on the recommendation system, mak- similarity will be helpful for the improvement of the recom- ing progress about this particular issue. But data sparseness mended accuracy and the applicability to the recommended has always been an important reason for the recommenda- algorithm [5]. The researcher takes into account the score tion low accuracy. To make full use of existing information, information that best reflects the user’s preferences. They researchers have proposed more and more excellent algo- came up with similarity measures such as cosine similarity [6], pearson correlation coefficient [7], adjusted cosine [8]. And others put forward such as Salton similarity [6] and Jaccard similarity [9], which is taken into account the num- B Jianrui Chen ber of items of self and their neighbors. A vertex similarity jianrui_chen@snnu.edu.cn index CosRA is proposed, which combines both advantages Chunxia Zhao of cosine index and resource-allocation index [1]. This fusion zhaochunxia911@163.com enhances the overall performance of personalized recom- School of Computer Science, Shaanxi Normal University, mendation. Xi’an 710119, China Model-based CF algorithms learn a model from the College of Science, Inner Mongolia University of training data and subsequently, the model is utilized for rec- Technology, Hohhot 010051, China 123 148 Complex & Intelligent Systems (2020) 6:147–156 ommendations [10], such as matrix decomposition model Evolutionary clustering algorithm [11–13], clustering model [14–16]. In Ref. [11] presented a collaborative filtering recommendation algorithm based on Many complex social, biological, and information systems SVD smoothing. Their approach predicts item ratings that can be described by networks, where nodes indicate indi- users have not rated by the employ of SVD technology, viduals, and links represent the relationships or interactions and then uses Pearson correlation similarity measurement between nodes [26]. And the study of link prediction has to find the target users neighbors, lastly produces the rec- become a common focus of science. A link between two ommendations, which can alleviate the sparsity problems nodes can be found by the topology information of network, of the user item rating dataset.Kindly rephrase the sentence or the feature vectors of nodes. In this paper, the network “Their approach predicts item .....” Superiority of CF algo- G(V , E ) is constructed, where V is the set of nodes contain- rithms with clustering techniques relieve the impact of data ing M users and N items, and E is the set of scores of items sparseness and cold start problems, which have been veri- from users. Firstly, clustering method is presented to gather fied by many research experiments [17,18]. Recently, many similar interest users into the same groups. scholars have also proposed dynamic evolutionary cluster- ing algorithms [19–21], compared to the classical Kmeans Data pre-processing algorithm, which not only reduced time-consuming and com- plexity, but also have indeterminate classification category. As we all know, clustering is roughly divided into three steps: Liao et al. [22] have presented an approach which applied the deal with data at first, then choose the appropriate distance user-product rating matrix without the necessity of collecting function to measure the similarity between objects, and at extra attributes about customers and products to cluster, and last obtain the groups. Naturally, the most important step of clusters are formed automatically. Shang et al. proposed a clustering is to choose optimal clustering principle. Ba et al. novel fuzzy double trace norm minimization method for rec- [12] proposed a method to cluster all users by calculating the ommender systems with faster running time [23]. Recently, user characteristic value with the attributes of users. But this we have presented two novel recommender algorithms based method requires much computational cost. So, in this paper, on heterogenous network model [24] and filled matrix [25]. to reduce the complexity, only the score matrix is adopted to They combined dynamic clustering with traditional collabo- gather users into groups. rative filtering to gain better recommendation results. To effectively apply scores information, Ref. [22] nor- Based on the above work, a novel recommended algorithm malized the ratings of user u according to Eq. (1). By is proposed in this paper. Main motivations are given as fol- this, the influences in habit of people giving ratings will be lows. Firstly, to reduce complexity and time-consuming, the reduced. So, we take the same approach, original score matrix following two steps are executed on the score matrix: normal- R = (r ) will be processed by (1). ij M ×N 1 izing and reducing the dimension. Then, clustering principle is constructed and a novel dynamic evolutionary clustering model is proposed to imitate the changes of different nodes in Q = r , i = 1, ..., M , i ij network model. Secondly, to better grasp the highest similar j =1 (1) interest neighbors, a new similarity index which combines ij r = , 1 ≤ j ≤ m . ij i advantage of degrees with potential information of nodes. Thirdly, ratings are predicted in each group based on user correlation. Here, r denotes the user i rating of the item j, m denotes ij i The rest of the paper is structured as follows. Section the number of items from user u . “Evolutionary clustering algorithm” gives the evolutionary At the same time, to gain better recommendation results, clustering model. In section “Similarity indices”, we review items with fewer scores are removed. And in this way, the existing similarity indices and put forward a new similarity sparse scoring matrix will become relatively dense. Here, the index. Full description of our algorithm is presented in sec- number of items is changed from N to N , N > N . By doing 1 1 tion “The UCEC&CF algorithm flow”. Section “Experiment this, the new relationship between users and items is gener- and studies” shows experimental results and comparisons ated, which is denoted as R = ( r ) . The proportion of ij M ×N with existing algorithms, and presents advantages of our removing is denoted as α and the influence of α on the results algorithm. Section “Conclusion” concludes the paper. is later shown in section “Experiment and studies”. Adjacent matrix A of network is constructed similar to Ref. [24]. In this paper, adjacency matrix is given as follows: 0 R M ×N A = . (R ) 0 N ×M 123 Complex & Intelligent Systems (2020) 6:147–156 149 Here, A is an N + M dimension square matrix. 0 is zero Table 1 Similarity indices matrix. Common neighbors S =|Γ(u) ∩ Γ(v)| uv (CN) Evolutionary clustering algorithm |Γ(u)∩Γ(v)| Hub promted index S = uv min{k(u),k(v)} (HPI) Wu et al. [27] have presented generalized Kuramato model to |Γ(u)∩Γ(v)| Hub depressed index S = uv max {k(u),k(v)} identify the community structure for positive networks. The (HDI) model is defined as follows: |Γ(u)∩Γ(v)| Salton index (SI) S = uv k(u)×k(v) |Γ(u)∩Γ(v)| Jaccard index (JI) S = uv |Γ(u)∪Γ(v)| θ = ω + a sin(θ − θ ) i i ij j i r r ui vi i =1 N Cosine (COS) S = uv k 2 k 2 j =1 r r i =1 ui i =1 vi N (r −r )(r −r ) ui u vi v i ∈Γ(uv) Pearson (COR) S = uv 2 2 (r −r ) (r −r ) ui u vi v + (1 − a ) sin(θ − θ ). (2) i ∈Γ(uv) i ∈Γ(uv) ij j i (r −r )(r −r ) u v j =1 i ∈Γ(uv) ui vi Adjusted cosine S = uv 2 2 (r −r ) (r −r ) ui u vi v i ∈Γ(u) i ∈Γ(v) (ADJ) Here, θ is the state of the i-th node, ω denotes the initial i i state of node i, and it is randomly generated from the interval [0, 2π ]. N refers to the number of nodes in the network. If the range (|θ − θ | <ε) would get closer and be divided into i j i-th node is connected with the j-th node, a = 1, otherwise, ij the same group. At the same time, nodes with large differ- a =0. K > 0 is to make the states of two connected ij p ent state values would be divided into the different groups. oscillators in the network synchronize and K < 0is to Higher ε means more nodes are divided into the same cluster make the states of two unconnected oscillators in the network and lower ε means fewer nodes are divided into the same evolve far away. cluster. ε is to confirm which nodes would be divided into Inspired by this, the model is used in heterogeneous net- the same cluster. work, and the user is divided into different communities by the relationship between the user and the item. The evolu- tionary clustering rule is designed as: if the user is rating Similarity Indices the item with higher score, they are regarded as in the same group. Local similarity indices In this paper, the dynamic evolutionary clustering model is defined as: Most of the existing similarity indices are classified into two categories: local and global similarity indices. The similarity θ = ω + sin(θ − θ ) i i j i indices [28] are defined in Table 1, where Γ(u) denotes the N + M j ∈NL set of neighbors of u, Γ(uv) denotes the set of common neighbors of u and v, k(u) is the degree of user u, r denotes ui + sin(θ − θ ). (3) j i N + M user u rating of item i, r is historical average score of u. j ∈ /NL Here, User correlation indices NL ={ j | A <δ}, In real life, people have different expectations for the prod- i ij ucts, those who have lower expectations tend to give a high where N + M refers to the number of users and items in the lever of satisfaction feedback, while the others seem like to be network. The coupling parameters K >0, K < 0are to much pickier. For instance, user A who has the lower expec- p n make the nodes with higher scores in the network evolve tions always gives 4 or 5 stars for the product feedback; user together and to make the nodes with lower scores in the B who belongs to the other side only gives 4 stars feedback network evolve far away, respectively. δ denotes the criti- which could be possibly the high score he or she marked in cal value of the high and low scores, the size is the median the history. So when both of them make their comments like of the nonzero element in the matrix A. With the random 4 stars on product, it actually means different, dose not it? generation of the initial values of the nodes, the iteration So in this paper, we evaluate the user’s preference when they begins. Ref. [27] has verified the convergence of the nodes do the feedback. evolution. After a certain number of iterations, the states of On the other hand, different feedbacks from different users nodes would be stable. Nodes with state values in nearby can be a good evaluation about the product, for example, user 123 150 Complex & Intelligent Systems (2020) 6:147–156 A gives 2 stars while user B gives 5 stars on a same product. Why they originally buy this product? The reason maybe is that they both like its appearance. But why they mark feed- back are quite different? The reason maybe is practicability, price, even after-sales service. Thereby, it is more difficult to figure out why exactly users give different evaluations about same product. So, the paper takes the general evaluation into account; meanwhile, the feedback needs to focus on the same product. Based on the above reasons, the calculation method of user correlation is given as follows : |B|+|C | SIM = √ . (4) uv |N (u)||N (v)| Here, B ={i |r ≥ β and r ≥ γ }, C ={i |r < ui vi ui Fig. 1 The UCEC&CF Algorithm Flow β and r <γ }. N (u) denotes items set of user u has vi evaluated in original score matrix R. N (v) denotes items set of user v has evaluated in original score matrix R. β and γ Step 6 Predict the ratings. For every rating of items to be are two parameters; in our experiment they represent histor- tested from target users, the calculation method is according ical average rating scores, respectively. It can be found that to formula (4). the numerator of SIM considers potential information of uv users and the denominator is about degrees of users, which SIM (r − r ) uv vi v v∈U r = r +  . (5) can obtain more similar information between users u and v. ui u SIM uv v∈U In our algorithm, we do not adopt the normalization in similarity indices. Similarity calculation is applied to differ- Here, r denotes the rating of item j from user i, r is his- ui u entiate the influences of neighbors. torical average score of u.SIM denotes the relationship uv between user u and v. U is neighbor set which contains the most similar k users with user u in same group according to our user correlation index. The UCEC&CF Algorithm Flow Step 7 Top-N recommend. All the predict scores for the target user are sorted in descending order. Pick the first N In this paper, Collaborative Filtering Recommendation Algo- items to generate a recommend list and give them to the rithm Based on User Correlation and Evolutionary Clustering target user. (denoted as UCEC&CF) algorithm is proposed. The proce- dure of algorithm is given as Fig. 1. Step 1 Generate the adjacency matrix. Input score matrix Experiment and studies of users and items, normalize the matrix by (1), remove the items with fewer scores, and generate the adjacency matrix All the algorithms are coded in Matlab 2013a; the configura- as A. tion of the experimental platform is Intel(R) Xeon(R) CPU Step 2 Generate the initial states of nodes. States of nodes are randomly generated from the interval [0, 2π ]. E5-2680 v2 @ 2.80GHz, 64 GB memory. The operating sys- tem is Windows Server 2008 R2 Enterprise. Step 3 Update states of nodes. Given two values for cou- pling strength parameters K and K and iteration threshold p n t, states are updated by (2) and the states are stable at some Dataset and evaluation metrics certain values. In all the simulations of this paper, K = 10, K =− 0.01, t = 100. We conducted experiments on movielens dataset (http:// Step 4 Obtain the group structure. Nodes would be divided grouplens.org/datasets/movi-elens/). The dataset from Grou- into a group if the state values meets the condition |θ −θ | < plens is the most popular dataset in the field of recommender i j ε. In all the simulations of this paper, ε = 0.001. systems. Our evaluation is restricted to 100 K and 1 M Step 5 Calculate the user correlation matrix. User corre- datasets; the former consists of 1682 movies, 943 users, and lation matrix SIM is calculated by (3) in its group based on almost 100,000 known ratings in the scale from 1 to 5 and the original score matrix. the latter contains 1,000,209 anonymous ratings of approxi- 123 Complex & Intelligent Systems (2020) 6:147–156 151 0.756 0.97 UCEC&CF 0.968 UCEC&CF 0.754 DEHC−CF DEHC−CF 0.966 0.752 0.964 0.962 0.75 0.96 0.748 0.958 0.956 0.746 0.954 0.744 0.952 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 number of neighbors number of neighbors (a) (b) Fig. 2 100 K dataset. Comparison results of DEHC-CF and UCEC&CF algorithms. a MAE comparing results, b RMSE comparing results mately 3900 movies made by 6040 users. To reduce the error, Table 2 Comparisons of DEHC-CF and UCEC&CF algorithms on F (%) with 100 K dataset we conducted fivefold cross-validation taking 80% of avail- able ratings as train and the rest as test data. Algorithm No. of recommendations In general, the evaluation metrics of recommended system 2 4 6810 12 are accuracy, but the accuracy of prediction and Top-N rec- DEHC-CF 15.54 27.43 36.59 43.73 49.05 54.02 ommendation is different. In this paper, Mean Absolute Error UCEC&CF 15.57 27.43 36.59 43.80 49.60 54.15 (MAE) and Root Mean Squared Error (RMSE) are applied to verify the accuracy of prediction. The lower the MAE (or The bolditalic values are corresponding optimal values RMSE) is, the better the prediction ratings. Experimental results MAE = |r −  r |. (6) ui ui In this subsection, we apply the multiview to test the accuracy u=1 of our algorithm UCEC&CF, and compare the accuracy of n the prediction and Top-N recommendation with other algo- RMSE = |r −  r | . (7) rithms. ui ui u=1 1. Comparing results of processing score matrix. To test the impact of normalization and dimension reduction on Here, n is the number of ratings to be predicted. r and  r ui ui the score information, we compare our algorithm with are the real and predicted scores, respectively. DEHC-CF [24]. In their paper, the authors established het- We also evaluate our algorithm based on the precision of erogeneous network, and generated the groups based on Top-N recommendation measured in terms of F in varying score information. Parameters in clustering principle for five lengths of recommendation list. The higher the F is, the better cross-validation sets are δ = 0.0070,δ = 0.0072,δ = 1 2 3 the recommender system algorithm. 0.0071,δ = 0.0071,δ = 0.0069, and α = 50% in the 4 5 following. The parameters of our clustering algorithm are 2 × recall × precision F = , (8) K =10, K =−0.01. p n recall + precision It can be observed from Fig. 2 that our method is slightly |R ∩ T | u u lower than DEHC-CF in some cases. Especially, when the recall =  , (9) |T | number of neighbors is 60, our experiment is more effec- |R ∩ T | u u tive. Mean value of MAE of DEHC-CF is 0.7491 and mean precision = . (10) |R | value of RMSE is 0.9572. Mean values of MAE and RMSE of UCEC&CF are 0.7482 and 0.9560, respectively. It can be R denotes the set of users in the real high ratings (i.e., ≥ 3 shown that score matrix processing is meaningful to improve in movielens dataset) and T denotes users set in predicted recommender precision. Table 2 shows the F measurements high ratings in the test set. along with different number of recommendations. For all MAE RMSE 152 Complex & Intelligent Systems (2020) 6:147–156 0.84 1.05 0.83 1.04 UCEC&CF 0.82 1.03 DTD 0.81 1.02 DPSO Kmeans 0.8 1.01 0.79 1 UCEC&CF DTD 0.78 0.99 DPSO 0.77 0.98 Kmeans 0.76 0.97 0.75 0.96 0.74 0.95 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 number of neighbors number of neighbors (a) (b) Fig. 3 100 K dataset. Comparison results of different clustering methods. a MAE metrics, b RMSE metrics Table 3 Comparison results of DTD, DPSO, Kmeans and UCEC&CF UCEC&CF, DTD, DPSO and Kmeans are 0.7482, 0.7882, algorithm on F (%) with 100 K dataset 0.8362 and 0.7557, respectively. Furthermore, mean values of RMSE for UCEC&CF, DTD, DPSO and Kmeans are Algorithm No. of recommendations 0.9560, 1.0070, 1.0437 and 0.9668. Besides, our proposed 2 4 6 8 10 12 method is more efficient than Kmeans algorithm, as our DTD 15.21 26.91 35.95 43.09 48.84 53.49 method does not require the predefined number of clusters. DPSO 14.48 25.80 34.63 41.55 47.20 51.82 Table 3 shows the F values for all the algorithms. Kmeans 15.50 27.39 36.58 43.83 49.65 54.30 UCEC&CF is more efficient than DTD and DPSO in view UCEC&CF 15.57 27.43 36.59 43.80 49.60 54.15 of F values in all cases. In terms of top-N recommendation accuracy, our algorithm slightly inferior to Kmeans algorithm The bolditalic values are corresponding optimal values with increasing length of recommendation list. F values of DPSO are worst. algorithms, F improves with increasing length of recommen- dation list. Our algorithm performs better and recommends 3. Comparing results of different similar indices. As follows, more satisfactory items for user when the number of recom- our user correlation results are compared with other eight mendations becomes larger. similarity indices (see Table 1), and the results are shown in Fig. 4 and Table 4. All comparing results do not apply 2. Comparing results of different clustering methods. To test similarity indices normalization. whether the clustering method of our algorithm is efficient, It can be seen that our method carries the best performance three methods are compared. They are DTD in Ref. [19], from Fig. 4. Some compared similarity indices only con- DPSO in Ref. [29] and Kmeans. In their papers, iteration sider the degree of the nodes and others are computed from threshold of DTD is 0.1. The parameters of DPSO are as fol- users’ ratings to same items. Our algorithm obtains higher lows: particle swarm size is 100, the number of interactions recommendation results, and the reason is that our presented is 4, learning factor is 1.494, max and min inertia weights user correlation considers degree of nodes and score infor- are 0.9 and 0.4, respectively. The cluster number of Kmeans mation. is given three cases and individuals in network are divided To further check the performance of the proposed algo- into 3 clusters, 6 clusters and 10 clusters. The test results are rithm, Table 4 gives recommended accuracy between algo- shown in Fig. 3 and Table 3. All comparing results do not rithms in different number of recommendations. It can be apply similarity indices normalization. seen that our method gains better performance than the oth- Figure 3 presents MAE and RMSE values along differ- ers. ent setting of neighborhood size. It can be observed that our presented method is more effective than the other clus- 4. Comparing results of different α. In the algorithm, part tering algorithms. The result of DPSO is that each group of the data is removed to reduce the impact of sparseness. contains only one user; so, the prediction accuracy is inde- α is the proportion of removal items. Experimenting with pendent of the number of neighbors. Mean values of MAE of MAE RMSE Complex & Intelligent Systems (2020) 6:147–156 153 0.81 1 UCEC&CF UCEC&CF 0.995 CN 0.8 CN HPI HPI 0.99 HDI HDI 0.79 SI 0.985 SI JI JI 0.98 COS 0.78 COS COR 0.975 COR ADJ 0.77 ADJ 0.97 0.965 0.76 0.96 0.75 0.955 0.74 0.95 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 number of neighbors number of neighbors (a) (b) Fig. 4 100 K dataset. Comparing results of UCEC&CF and existing similarity indices. a MAE metrics, b RMSE metrics Table 4 The comparison results between UCEC&CF and existing sim- part, the number of neighbors is 30 and the number of rec- ilarity indices on F(%) with 100 K dataset ommendations is 10. Range of MAE is [0.7432, 0.7457], range of RMSE is [0.9506, 0.9534] and range of F is Algorithm No. of recommendations [0.4948, 0.4964]. 2 4 6 8 10 12 5. The comparison results between several existing recom- CN 15.41 27.24 36.41 43.50 49.39 53.88 mendation algorithms. Collaborative filtering and matrix HPI 15.46 27.26 36.35 43.56 49.37 54.03 decomposition are representative of existing recommenda- HDI 15.46 27.37 36.47 43.58 49.40 54.07 tion algorithms. To test the effectiveness of our overall algo- SI 15.50 27.36 36.51 43.69 49.40 54.13 rithm, our UCEC&CF algorithm is compared with DEHC- JI 15.46 27.41 36.57 43.64 49.46 54.03 CF and DTNM [23] on two datasets: MovieLens100K, COS 15.43 27.36 36.41 43.54 49.41 54.03 MovieLens1M. DEHC-CF is a novel heterogeneous evo- COR 15.41 27.28 36.44 43.67 49.40 54.12 lutionary clustering algorithm based on user collaborative ADJ 15.48 27.27 36.49 43.64 49.48 54.13 filtering, the coupling strength K 1 = 20 and K 2 =−20. UCEC&CF 15.57 27.43 36.59 43.80 49.60 54.15 DTNM is a novel fuzzy double trace norm minimization The bolditalic values are corresponding optimal values method for recommender systems, the values of rank are integer between 5 and 15, and it has little effect on the result. 0.75 DTNM is compared with rank = 5. The comparing results on 0.745 different data sets are illustrated in Fig. 6 and Table 5. 0.74 Figure 6a, b gives the MAE and RMSE values, respec- tively, for 100 K dataset. MAE and RMSE values for the 0.955 1 M dataset are given in Fig. 6c, d. From overall view, the improvement in MAE and RMSE is more pronounced 0.95 for the 1 M dataset than the 100 K one as the former has higher explicit rating data sparsity.The result of DTNM is 0.5 independent of the number of neighbors, which is because 0.495 that it is a rank-dependent matrix decomposition algorithm. 0.49 As expected, our UCEC&CF algorithm performs the best 10 20 30 40 50 60 amongst all compared algorithms, which means ours work (%) gains better performance than DEHC-CF and DTNM. Our Fig. 5 100 K dataset. Influence results of α on the recommendation work broadly includes: building heterogeneous networks, performance setting cluster principles, defining user correlation, etc. Table 5 gives a description of UCEC&CF, DEHC-CF 100 K dataset, the results are shown in Fig. 5; parameter α and DTNM when items are recommended to users. The has slight influence on the results. In the simulations of this MAE RMSE MAE RMSE 154 Complex & Intelligent Systems (2020) 6:147–156 0.82 1.02 0.81 1.01 0.8 UCEC&CF 0.79 DEHC−CF DTNM 0.99 0.78 0.98 UCEC&CF 0.77 DEHC−CF DTNM 0.97 0.76 0.96 0.75 0.74 0.95 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 number of neighbors number of neighbors (a) MovieLens −100K (b) MovieLens −100K 0.83 1.03 0.82 1.02 0.81 1.01 0.8 1 UCEC&CF UCEC&CF DEHC−CF DEHC−CF 0.79 0.99 DTNM DTNM 0.78 0.98 0.77 0.97 0.76 0.96 0.75 0.95 0.74 0.94 0.73 0.93 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 number of neighbors number of neighbors (c) MovieLens −1M (d) MovieLens −1M Fig. 6 MAE and RMSE of different methods on two datasets Table 5 Comparison results of MovieLens Algorithm No. of recommendations different algorithms on F (%) 2 4 6 8 10 12 100 K DEHC-CF 15.34 27.16 36.18 43.39 49.09 53.76 DTNM 15.46 27.29 36.38 43.58 49.32 54.01 UCEC&CF 15.57 27.43 36.59 43.80 49.60 54.15 1 M DEHC-CF 12.69 23.53 32.17 39.12 44.79 49.53 DTNM 12.55 23.19 31.68 38.49 44.04 48.71 UCEC&CF 12.82 23.74 32.42 39.40 45.11 49.89 The bolditalic values are corresponding optimal values MAE MAE RMSE RMSE Complex & Intelligent Systems (2020) 6:147–156 155 experiment results demonstrate that our method gains bet- 4. Sarwar BM, Karypis G, Konstan JA, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. Www 1:285– ter performance than the others compared algorithms with different datasets. 5. Leicht EA, Holme P, Newman ME (2006) Vertex similarity in net- In a word, in section “Experiment and Studies”, two real works. Phys. Rev. E 73(2):026120 datasets with different size are considered to verify the effi- 6. Salton G, Mcgill MJ (1983) Introduction to modern information retrieval. McGraw-Hill, New York ciency of our proposed algorithm UCEC&CF. Five diversity 7. Breese JS, Heckerman D, Kadie C (2013) Empirical analysis of experiments are tested and UCEC&CF gains excellent rec- predictive algorithms for collaborative filtering. ACM 7(7):43–52 ommendation results. 8. Shardanand U, Maes P (1995) Social information filtering: algo- rithms for automating word of mouth. In Chi 95:210–217 9. Salton G (1989) Automatic text processing: the transformation, analysis. Addison-Wesley, Reading Conclusion 10. Pazzani MJ, Billsus D (2007) Content-based recommendation sys- tems. Springer-Verlag, New York, pp 325–341 11. Ren YB, Gong S J (2009) A collaborative filtering recommendation A new CF model based on user correlation and evolution- algorithm based on SVD smoothing. In: Third International Sym- ary clustering is proposed in this paper. The performance of posium on Intelligent Information Technology Application IEEE, our proposed method is verified by comparative experiments pp 530–532 from different aspects. On the one hand, only normalized and 12. Ba Q, Li X, Bai Z (2013) Clustering collaborative filtering recom- mendation system based on SVD algorithm. In: IEEE International reduced dimension score matrix is applied to generate clus- Conference on Software Engineering and Service Science, pp 963– tering principle. An improved dynamic evolutionary model is constructed to imitate the constantly changing states values 13. Zhang Y, Chen M, Huang D, Wu D, Li Y (2016) Idoctor: person- in the network. Similar users are divided into the same group. alized and professionalized medical recommendations based on hybrid matrix factorization. Future Gener Comput Syst 66:30–35 Besides, user correlation is proposed to measure the distance 14. Sarwar BM, Karypis G, Konstan J (2002) Recommender sys- between users by combining the users satisfaction and the tems for large-scale E-commerce: scalable. Confer Comput Inform potential score information. In each group, a novel user cor- Technol 1:291–324 relation measurement is adopted to find the highest similar 15. Haldar NAH, Khan FA, Ali A, Abbas H (2017) Arrhythmia classi- fication using mahalanobis distance based improved fuzzy c-means neighbors with target user. Finally, user-based collaborative clustering for mobile health monitoring systems. Neurocomputing filtering is adopted in each group. Extensive experiments 220:221–235 demonstrate our proposed method gains better recommen- 16. Wang Z, Wen Q, Sun Y, Zhang H (2012) A Fault Detection Scheme dation performance than the other compared algorithms. But Based on Self-Clustering Nodes Sets for Wireless Sensor Net- works. In: IEEE 12th International Conference on Computer and the work is still much room for improvement, we want to Information Technology, pp 921–925 join some other information of network into the design for 17. Birtolo C, Ronca D, Aurilio G (2012) Trust-aware clustering col- further performance enhancement. laborative filtering: identification of relevant items. Springer, Berlin Heidelberg, pp 374–384 Acknowledgements This research was supported through National 18. Huang C, Yin J (2010) Effective association clusters filtering to Natural Science Foundation of China (No. 71561020, 61503203, cold-start recommendations. In: Seventh international conference 61702317, 61771297); Fundamental Research Funds for the Central on fuzzy systems and knowledge discovery, IEEE, vol 5, pp 2461– Universities (No. GK201802013). 19. Wu J, Jiao Y (2014) Clustering dynamics of complex discrete- Open Access This article is distributed under the terms of the Creative time networks and its application in community detection. Chaos Commons Attribution 4.0 International License (http://creativecomm 24(3):268–4 ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, 20. Wu J, Zhang L, Li Y, Jiao Y (2016) Partition signed social networks and reproduction in any medium, provided you give appropriate credit via clustering dynamics. Phys A 443:568–582 to the original author(s) and the source, provide a link to the Creative 21. Chen J, Wang H, Wang L, Liu W (2016) A dynamic evolutionary Commons license, and indicate if changes were made. clustering perspective: community detection in signed networks by reconstructing neighbor sets. Phys A 447:482–492 22. Liao CL, Lee SJ (2016) A clustering based approach to improv- ing the efficiency of collaborative filtering recommendation. Electr References Commer Res Appl 18:1–9 23. Shang F, Liu Y, Cheng J, Yan D (2017) Fuzzy double trace norm 1. Chen LJ, Zhang ZK, Liu JH, Gao J, Zhou T (2017) A vertex minimization for recommendation systems. IEEE Trans Fuzzy Syst similarity index for better personalized recommendation. Phys. A 99:1–1 466:607–615 24. Chen J, Uliji, Wang H (2018) Collaborative rating prediction based on dynamic evolutionary heterogeneous clustering. Swarm Evol 2. Singh R, Patra BK, Adhikari B (2015) A complex net- Comput 394–399 work approach for collaborative recommendation. arXiv preprint 25. Liji U, Chai Y, Chen J (2018) Improved personalized recommen- arXiv:1510.00585 dation based on user attributes clustering and score matrix filling. 3. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Comput Stand Interfaces 57:59–67 GroupLens: an open architecture for collaborative filtering of 26. Ding J, Jiao L, Wu J (2015) Prediction of missing links based on netnews. In: Proceedings of the ACM conference on Computer supported cooperative work. ACM, pp 175–186 multi-resolution community division. Phys A 417:76–85 123 156 Complex & Intelligent Systems (2020) 6:147–156 27. Wu J, Jiao L, Jin C, Liu F, Gong M (2012) Overlapping community 29. Pizzuti C (2008) GA-Net: a genetic algorithm for community detec- detection via network dynamics. Phys Rev E 85(2):016115 tion in social networks. Springer-Verlag, New York, pp 1081–1090 28. Lu L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A 390(6):1150–1170 Publisher’s Note Springer Nature remains neutral with regard to juris- dictional claims in published maps and institutional affiliations. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Complex & Intelligent Systems Springer Journals

Collaborative filtering recommendation algorithm based on user correlation and evolutionary clustering

Loading next page...
 
/lp/springer-journals/collaborative-filtering-recommendation-algorithm-based-on-user-jKv5H0gdNn

References (31)

Publisher
Springer Journals
Copyright
Copyright © The Author(s) 2019
Subject
Engineering; Computational Intelligence; Complexity; Data Structures and Information Theory
ISSN
2199-4536
eISSN
2198-6053
DOI
10.1007/s40747-019-00123-5
Publisher site
See Article on Publisher Site

Abstract

In recent years, application of recommendation algorithm in real life such as Amazon, Taobao is getting universal, but it is not perfect yet. A few problems need to be solved such as sparse data and low recommended accuracy. Collaborative filtering is a mature algorithm in the recommended systems, but there are still some problems. In this paper, a novel collaborative filtering recommendation algorithm based on user correlation and evolutionary clustering is presented. Firstly, score matrix is pre-processed with normalization and dimension reduction, to obtain denser score data. Based on these processed data, clustering principle is generated and dynamic evolutionary clustering is implemented. Secondly, the search for the nearest neighbors with highest similar interest is considered. A measurement about the relationship between users is proposed, called user correlation, which applies the satisfaction of users and the potential information. In each user group, user correlation is applied to choose the nearest neighbors to predict ratings. The proposed method is evaluated using the Movielens dataset. Diversity experimental results demonstrate that the proposed method has outstanding performance in predicted accuracy and recommended precision. Keywords Collaborative filtering · User correlation · Dynamic evolutionary clustering · Rating prediction · Recommendation Introduction rithms [2], such as neighborhood-based CF(Collaborative Filtering) and model-based CF. The development of the internet and e-commerce makes our Neighborhood-based CF algorithms are further classified life more convenient as billions of required products are into two categories: user-based CF [3] and item-based CF [4]. searchable online. Meanwhile, we must face the problem of And the basic principle of them is interlinked. For instance, information overload in daily life. Under the circumstances, user-based CF considers two users to be similar when their it is much harder for us to dig out relevant object that we neighbors are similar. Obviously, selection of nearest neigh- really want than ever before [1]. Many researchers have bor is significant. Correspondingly, choosing the appropriate done plenty of research on the recommendation system, mak- similarity will be helpful for the improvement of the recom- ing progress about this particular issue. But data sparseness mended accuracy and the applicability to the recommended has always been an important reason for the recommenda- algorithm [5]. The researcher takes into account the score tion low accuracy. To make full use of existing information, information that best reflects the user’s preferences. They researchers have proposed more and more excellent algo- came up with similarity measures such as cosine similarity [6], pearson correlation coefficient [7], adjusted cosine [8]. And others put forward such as Salton similarity [6] and Jaccard similarity [9], which is taken into account the num- B Jianrui Chen ber of items of self and their neighbors. A vertex similarity jianrui_chen@snnu.edu.cn index CosRA is proposed, which combines both advantages Chunxia Zhao of cosine index and resource-allocation index [1]. This fusion zhaochunxia911@163.com enhances the overall performance of personalized recom- School of Computer Science, Shaanxi Normal University, mendation. Xi’an 710119, China Model-based CF algorithms learn a model from the College of Science, Inner Mongolia University of training data and subsequently, the model is utilized for rec- Technology, Hohhot 010051, China 123 148 Complex & Intelligent Systems (2020) 6:147–156 ommendations [10], such as matrix decomposition model Evolutionary clustering algorithm [11–13], clustering model [14–16]. In Ref. [11] presented a collaborative filtering recommendation algorithm based on Many complex social, biological, and information systems SVD smoothing. Their approach predicts item ratings that can be described by networks, where nodes indicate indi- users have not rated by the employ of SVD technology, viduals, and links represent the relationships or interactions and then uses Pearson correlation similarity measurement between nodes [26]. And the study of link prediction has to find the target users neighbors, lastly produces the rec- become a common focus of science. A link between two ommendations, which can alleviate the sparsity problems nodes can be found by the topology information of network, of the user item rating dataset.Kindly rephrase the sentence or the feature vectors of nodes. In this paper, the network “Their approach predicts item .....” Superiority of CF algo- G(V , E ) is constructed, where V is the set of nodes contain- rithms with clustering techniques relieve the impact of data ing M users and N items, and E is the set of scores of items sparseness and cold start problems, which have been veri- from users. Firstly, clustering method is presented to gather fied by many research experiments [17,18]. Recently, many similar interest users into the same groups. scholars have also proposed dynamic evolutionary cluster- ing algorithms [19–21], compared to the classical Kmeans Data pre-processing algorithm, which not only reduced time-consuming and com- plexity, but also have indeterminate classification category. As we all know, clustering is roughly divided into three steps: Liao et al. [22] have presented an approach which applied the deal with data at first, then choose the appropriate distance user-product rating matrix without the necessity of collecting function to measure the similarity between objects, and at extra attributes about customers and products to cluster, and last obtain the groups. Naturally, the most important step of clusters are formed automatically. Shang et al. proposed a clustering is to choose optimal clustering principle. Ba et al. novel fuzzy double trace norm minimization method for rec- [12] proposed a method to cluster all users by calculating the ommender systems with faster running time [23]. Recently, user characteristic value with the attributes of users. But this we have presented two novel recommender algorithms based method requires much computational cost. So, in this paper, on heterogenous network model [24] and filled matrix [25]. to reduce the complexity, only the score matrix is adopted to They combined dynamic clustering with traditional collabo- gather users into groups. rative filtering to gain better recommendation results. To effectively apply scores information, Ref. [22] nor- Based on the above work, a novel recommended algorithm malized the ratings of user u according to Eq. (1). By is proposed in this paper. Main motivations are given as fol- this, the influences in habit of people giving ratings will be lows. Firstly, to reduce complexity and time-consuming, the reduced. So, we take the same approach, original score matrix following two steps are executed on the score matrix: normal- R = (r ) will be processed by (1). ij M ×N 1 izing and reducing the dimension. Then, clustering principle is constructed and a novel dynamic evolutionary clustering model is proposed to imitate the changes of different nodes in Q = r , i = 1, ..., M , i ij network model. Secondly, to better grasp the highest similar j =1 (1) interest neighbors, a new similarity index which combines ij r = , 1 ≤ j ≤ m . ij i advantage of degrees with potential information of nodes. Thirdly, ratings are predicted in each group based on user correlation. Here, r denotes the user i rating of the item j, m denotes ij i The rest of the paper is structured as follows. Section the number of items from user u . “Evolutionary clustering algorithm” gives the evolutionary At the same time, to gain better recommendation results, clustering model. In section “Similarity indices”, we review items with fewer scores are removed. And in this way, the existing similarity indices and put forward a new similarity sparse scoring matrix will become relatively dense. Here, the index. Full description of our algorithm is presented in sec- number of items is changed from N to N , N > N . By doing 1 1 tion “The UCEC&CF algorithm flow”. Section “Experiment this, the new relationship between users and items is gener- and studies” shows experimental results and comparisons ated, which is denoted as R = ( r ) . The proportion of ij M ×N with existing algorithms, and presents advantages of our removing is denoted as α and the influence of α on the results algorithm. Section “Conclusion” concludes the paper. is later shown in section “Experiment and studies”. Adjacent matrix A of network is constructed similar to Ref. [24]. In this paper, adjacency matrix is given as follows: 0 R M ×N A = . (R ) 0 N ×M 123 Complex & Intelligent Systems (2020) 6:147–156 149 Here, A is an N + M dimension square matrix. 0 is zero Table 1 Similarity indices matrix. Common neighbors S =|Γ(u) ∩ Γ(v)| uv (CN) Evolutionary clustering algorithm |Γ(u)∩Γ(v)| Hub promted index S = uv min{k(u),k(v)} (HPI) Wu et al. [27] have presented generalized Kuramato model to |Γ(u)∩Γ(v)| Hub depressed index S = uv max {k(u),k(v)} identify the community structure for positive networks. The (HDI) model is defined as follows: |Γ(u)∩Γ(v)| Salton index (SI) S = uv k(u)×k(v) |Γ(u)∩Γ(v)| Jaccard index (JI) S = uv |Γ(u)∪Γ(v)| θ = ω + a sin(θ − θ ) i i ij j i r r ui vi i =1 N Cosine (COS) S = uv k 2 k 2 j =1 r r i =1 ui i =1 vi N (r −r )(r −r ) ui u vi v i ∈Γ(uv) Pearson (COR) S = uv 2 2 (r −r ) (r −r ) ui u vi v + (1 − a ) sin(θ − θ ). (2) i ∈Γ(uv) i ∈Γ(uv) ij j i (r −r )(r −r ) u v j =1 i ∈Γ(uv) ui vi Adjusted cosine S = uv 2 2 (r −r ) (r −r ) ui u vi v i ∈Γ(u) i ∈Γ(v) (ADJ) Here, θ is the state of the i-th node, ω denotes the initial i i state of node i, and it is randomly generated from the interval [0, 2π ]. N refers to the number of nodes in the network. If the range (|θ − θ | <ε) would get closer and be divided into i j i-th node is connected with the j-th node, a = 1, otherwise, ij the same group. At the same time, nodes with large differ- a =0. K > 0 is to make the states of two connected ij p ent state values would be divided into the different groups. oscillators in the network synchronize and K < 0is to Higher ε means more nodes are divided into the same cluster make the states of two unconnected oscillators in the network and lower ε means fewer nodes are divided into the same evolve far away. cluster. ε is to confirm which nodes would be divided into Inspired by this, the model is used in heterogeneous net- the same cluster. work, and the user is divided into different communities by the relationship between the user and the item. The evolu- tionary clustering rule is designed as: if the user is rating Similarity Indices the item with higher score, they are regarded as in the same group. Local similarity indices In this paper, the dynamic evolutionary clustering model is defined as: Most of the existing similarity indices are classified into two categories: local and global similarity indices. The similarity θ = ω + sin(θ − θ ) i i j i indices [28] are defined in Table 1, where Γ(u) denotes the N + M j ∈NL set of neighbors of u, Γ(uv) denotes the set of common neighbors of u and v, k(u) is the degree of user u, r denotes ui + sin(θ − θ ). (3) j i N + M user u rating of item i, r is historical average score of u. j ∈ /NL Here, User correlation indices NL ={ j | A <δ}, In real life, people have different expectations for the prod- i ij ucts, those who have lower expectations tend to give a high where N + M refers to the number of users and items in the lever of satisfaction feedback, while the others seem like to be network. The coupling parameters K >0, K < 0are to much pickier. For instance, user A who has the lower expec- p n make the nodes with higher scores in the network evolve tions always gives 4 or 5 stars for the product feedback; user together and to make the nodes with lower scores in the B who belongs to the other side only gives 4 stars feedback network evolve far away, respectively. δ denotes the criti- which could be possibly the high score he or she marked in cal value of the high and low scores, the size is the median the history. So when both of them make their comments like of the nonzero element in the matrix A. With the random 4 stars on product, it actually means different, dose not it? generation of the initial values of the nodes, the iteration So in this paper, we evaluate the user’s preference when they begins. Ref. [27] has verified the convergence of the nodes do the feedback. evolution. After a certain number of iterations, the states of On the other hand, different feedbacks from different users nodes would be stable. Nodes with state values in nearby can be a good evaluation about the product, for example, user 123 150 Complex & Intelligent Systems (2020) 6:147–156 A gives 2 stars while user B gives 5 stars on a same product. Why they originally buy this product? The reason maybe is that they both like its appearance. But why they mark feed- back are quite different? The reason maybe is practicability, price, even after-sales service. Thereby, it is more difficult to figure out why exactly users give different evaluations about same product. So, the paper takes the general evaluation into account; meanwhile, the feedback needs to focus on the same product. Based on the above reasons, the calculation method of user correlation is given as follows : |B|+|C | SIM = √ . (4) uv |N (u)||N (v)| Here, B ={i |r ≥ β and r ≥ γ }, C ={i |r < ui vi ui Fig. 1 The UCEC&CF Algorithm Flow β and r <γ }. N (u) denotes items set of user u has vi evaluated in original score matrix R. N (v) denotes items set of user v has evaluated in original score matrix R. β and γ Step 6 Predict the ratings. For every rating of items to be are two parameters; in our experiment they represent histor- tested from target users, the calculation method is according ical average rating scores, respectively. It can be found that to formula (4). the numerator of SIM considers potential information of uv users and the denominator is about degrees of users, which SIM (r − r ) uv vi v v∈U r = r +  . (5) can obtain more similar information between users u and v. ui u SIM uv v∈U In our algorithm, we do not adopt the normalization in similarity indices. Similarity calculation is applied to differ- Here, r denotes the rating of item j from user i, r is his- ui u entiate the influences of neighbors. torical average score of u.SIM denotes the relationship uv between user u and v. U is neighbor set which contains the most similar k users with user u in same group according to our user correlation index. The UCEC&CF Algorithm Flow Step 7 Top-N recommend. All the predict scores for the target user are sorted in descending order. Pick the first N In this paper, Collaborative Filtering Recommendation Algo- items to generate a recommend list and give them to the rithm Based on User Correlation and Evolutionary Clustering target user. (denoted as UCEC&CF) algorithm is proposed. The proce- dure of algorithm is given as Fig. 1. Step 1 Generate the adjacency matrix. Input score matrix Experiment and studies of users and items, normalize the matrix by (1), remove the items with fewer scores, and generate the adjacency matrix All the algorithms are coded in Matlab 2013a; the configura- as A. tion of the experimental platform is Intel(R) Xeon(R) CPU Step 2 Generate the initial states of nodes. States of nodes are randomly generated from the interval [0, 2π ]. E5-2680 v2 @ 2.80GHz, 64 GB memory. The operating sys- tem is Windows Server 2008 R2 Enterprise. Step 3 Update states of nodes. Given two values for cou- pling strength parameters K and K and iteration threshold p n t, states are updated by (2) and the states are stable at some Dataset and evaluation metrics certain values. In all the simulations of this paper, K = 10, K =− 0.01, t = 100. We conducted experiments on movielens dataset (http:// Step 4 Obtain the group structure. Nodes would be divided grouplens.org/datasets/movi-elens/). The dataset from Grou- into a group if the state values meets the condition |θ −θ | < plens is the most popular dataset in the field of recommender i j ε. In all the simulations of this paper, ε = 0.001. systems. Our evaluation is restricted to 100 K and 1 M Step 5 Calculate the user correlation matrix. User corre- datasets; the former consists of 1682 movies, 943 users, and lation matrix SIM is calculated by (3) in its group based on almost 100,000 known ratings in the scale from 1 to 5 and the original score matrix. the latter contains 1,000,209 anonymous ratings of approxi- 123 Complex & Intelligent Systems (2020) 6:147–156 151 0.756 0.97 UCEC&CF 0.968 UCEC&CF 0.754 DEHC−CF DEHC−CF 0.966 0.752 0.964 0.962 0.75 0.96 0.748 0.958 0.956 0.746 0.954 0.744 0.952 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 number of neighbors number of neighbors (a) (b) Fig. 2 100 K dataset. Comparison results of DEHC-CF and UCEC&CF algorithms. a MAE comparing results, b RMSE comparing results mately 3900 movies made by 6040 users. To reduce the error, Table 2 Comparisons of DEHC-CF and UCEC&CF algorithms on F (%) with 100 K dataset we conducted fivefold cross-validation taking 80% of avail- able ratings as train and the rest as test data. Algorithm No. of recommendations In general, the evaluation metrics of recommended system 2 4 6810 12 are accuracy, but the accuracy of prediction and Top-N rec- DEHC-CF 15.54 27.43 36.59 43.73 49.05 54.02 ommendation is different. In this paper, Mean Absolute Error UCEC&CF 15.57 27.43 36.59 43.80 49.60 54.15 (MAE) and Root Mean Squared Error (RMSE) are applied to verify the accuracy of prediction. The lower the MAE (or The bolditalic values are corresponding optimal values RMSE) is, the better the prediction ratings. Experimental results MAE = |r −  r |. (6) ui ui In this subsection, we apply the multiview to test the accuracy u=1 of our algorithm UCEC&CF, and compare the accuracy of n the prediction and Top-N recommendation with other algo- RMSE = |r −  r | . (7) rithms. ui ui u=1 1. Comparing results of processing score matrix. To test the impact of normalization and dimension reduction on Here, n is the number of ratings to be predicted. r and  r ui ui the score information, we compare our algorithm with are the real and predicted scores, respectively. DEHC-CF [24]. In their paper, the authors established het- We also evaluate our algorithm based on the precision of erogeneous network, and generated the groups based on Top-N recommendation measured in terms of F in varying score information. Parameters in clustering principle for five lengths of recommendation list. The higher the F is, the better cross-validation sets are δ = 0.0070,δ = 0.0072,δ = 1 2 3 the recommender system algorithm. 0.0071,δ = 0.0071,δ = 0.0069, and α = 50% in the 4 5 following. The parameters of our clustering algorithm are 2 × recall × precision F = , (8) K =10, K =−0.01. p n recall + precision It can be observed from Fig. 2 that our method is slightly |R ∩ T | u u lower than DEHC-CF in some cases. Especially, when the recall =  , (9) |T | number of neighbors is 60, our experiment is more effec- |R ∩ T | u u tive. Mean value of MAE of DEHC-CF is 0.7491 and mean precision = . (10) |R | value of RMSE is 0.9572. Mean values of MAE and RMSE of UCEC&CF are 0.7482 and 0.9560, respectively. It can be R denotes the set of users in the real high ratings (i.e., ≥ 3 shown that score matrix processing is meaningful to improve in movielens dataset) and T denotes users set in predicted recommender precision. Table 2 shows the F measurements high ratings in the test set. along with different number of recommendations. For all MAE RMSE 152 Complex & Intelligent Systems (2020) 6:147–156 0.84 1.05 0.83 1.04 UCEC&CF 0.82 1.03 DTD 0.81 1.02 DPSO Kmeans 0.8 1.01 0.79 1 UCEC&CF DTD 0.78 0.99 DPSO 0.77 0.98 Kmeans 0.76 0.97 0.75 0.96 0.74 0.95 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 number of neighbors number of neighbors (a) (b) Fig. 3 100 K dataset. Comparison results of different clustering methods. a MAE metrics, b RMSE metrics Table 3 Comparison results of DTD, DPSO, Kmeans and UCEC&CF UCEC&CF, DTD, DPSO and Kmeans are 0.7482, 0.7882, algorithm on F (%) with 100 K dataset 0.8362 and 0.7557, respectively. Furthermore, mean values of RMSE for UCEC&CF, DTD, DPSO and Kmeans are Algorithm No. of recommendations 0.9560, 1.0070, 1.0437 and 0.9668. Besides, our proposed 2 4 6 8 10 12 method is more efficient than Kmeans algorithm, as our DTD 15.21 26.91 35.95 43.09 48.84 53.49 method does not require the predefined number of clusters. DPSO 14.48 25.80 34.63 41.55 47.20 51.82 Table 3 shows the F values for all the algorithms. Kmeans 15.50 27.39 36.58 43.83 49.65 54.30 UCEC&CF is more efficient than DTD and DPSO in view UCEC&CF 15.57 27.43 36.59 43.80 49.60 54.15 of F values in all cases. In terms of top-N recommendation accuracy, our algorithm slightly inferior to Kmeans algorithm The bolditalic values are corresponding optimal values with increasing length of recommendation list. F values of DPSO are worst. algorithms, F improves with increasing length of recommen- dation list. Our algorithm performs better and recommends 3. Comparing results of different similar indices. As follows, more satisfactory items for user when the number of recom- our user correlation results are compared with other eight mendations becomes larger. similarity indices (see Table 1), and the results are shown in Fig. 4 and Table 4. All comparing results do not apply 2. Comparing results of different clustering methods. To test similarity indices normalization. whether the clustering method of our algorithm is efficient, It can be seen that our method carries the best performance three methods are compared. They are DTD in Ref. [19], from Fig. 4. Some compared similarity indices only con- DPSO in Ref. [29] and Kmeans. In their papers, iteration sider the degree of the nodes and others are computed from threshold of DTD is 0.1. The parameters of DPSO are as fol- users’ ratings to same items. Our algorithm obtains higher lows: particle swarm size is 100, the number of interactions recommendation results, and the reason is that our presented is 4, learning factor is 1.494, max and min inertia weights user correlation considers degree of nodes and score infor- are 0.9 and 0.4, respectively. The cluster number of Kmeans mation. is given three cases and individuals in network are divided To further check the performance of the proposed algo- into 3 clusters, 6 clusters and 10 clusters. The test results are rithm, Table 4 gives recommended accuracy between algo- shown in Fig. 3 and Table 3. All comparing results do not rithms in different number of recommendations. It can be apply similarity indices normalization. seen that our method gains better performance than the oth- Figure 3 presents MAE and RMSE values along differ- ers. ent setting of neighborhood size. It can be observed that our presented method is more effective than the other clus- 4. Comparing results of different α. In the algorithm, part tering algorithms. The result of DPSO is that each group of the data is removed to reduce the impact of sparseness. contains only one user; so, the prediction accuracy is inde- α is the proportion of removal items. Experimenting with pendent of the number of neighbors. Mean values of MAE of MAE RMSE Complex & Intelligent Systems (2020) 6:147–156 153 0.81 1 UCEC&CF UCEC&CF 0.995 CN 0.8 CN HPI HPI 0.99 HDI HDI 0.79 SI 0.985 SI JI JI 0.98 COS 0.78 COS COR 0.975 COR ADJ 0.77 ADJ 0.97 0.965 0.76 0.96 0.75 0.955 0.74 0.95 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 number of neighbors number of neighbors (a) (b) Fig. 4 100 K dataset. Comparing results of UCEC&CF and existing similarity indices. a MAE metrics, b RMSE metrics Table 4 The comparison results between UCEC&CF and existing sim- part, the number of neighbors is 30 and the number of rec- ilarity indices on F(%) with 100 K dataset ommendations is 10. Range of MAE is [0.7432, 0.7457], range of RMSE is [0.9506, 0.9534] and range of F is Algorithm No. of recommendations [0.4948, 0.4964]. 2 4 6 8 10 12 5. The comparison results between several existing recom- CN 15.41 27.24 36.41 43.50 49.39 53.88 mendation algorithms. Collaborative filtering and matrix HPI 15.46 27.26 36.35 43.56 49.37 54.03 decomposition are representative of existing recommenda- HDI 15.46 27.37 36.47 43.58 49.40 54.07 tion algorithms. To test the effectiveness of our overall algo- SI 15.50 27.36 36.51 43.69 49.40 54.13 rithm, our UCEC&CF algorithm is compared with DEHC- JI 15.46 27.41 36.57 43.64 49.46 54.03 CF and DTNM [23] on two datasets: MovieLens100K, COS 15.43 27.36 36.41 43.54 49.41 54.03 MovieLens1M. DEHC-CF is a novel heterogeneous evo- COR 15.41 27.28 36.44 43.67 49.40 54.12 lutionary clustering algorithm based on user collaborative ADJ 15.48 27.27 36.49 43.64 49.48 54.13 filtering, the coupling strength K 1 = 20 and K 2 =−20. UCEC&CF 15.57 27.43 36.59 43.80 49.60 54.15 DTNM is a novel fuzzy double trace norm minimization The bolditalic values are corresponding optimal values method for recommender systems, the values of rank are integer between 5 and 15, and it has little effect on the result. 0.75 DTNM is compared with rank = 5. The comparing results on 0.745 different data sets are illustrated in Fig. 6 and Table 5. 0.74 Figure 6a, b gives the MAE and RMSE values, respec- tively, for 100 K dataset. MAE and RMSE values for the 0.955 1 M dataset are given in Fig. 6c, d. From overall view, the improvement in MAE and RMSE is more pronounced 0.95 for the 1 M dataset than the 100 K one as the former has higher explicit rating data sparsity.The result of DTNM is 0.5 independent of the number of neighbors, which is because 0.495 that it is a rank-dependent matrix decomposition algorithm. 0.49 As expected, our UCEC&CF algorithm performs the best 10 20 30 40 50 60 amongst all compared algorithms, which means ours work (%) gains better performance than DEHC-CF and DTNM. Our Fig. 5 100 K dataset. Influence results of α on the recommendation work broadly includes: building heterogeneous networks, performance setting cluster principles, defining user correlation, etc. Table 5 gives a description of UCEC&CF, DEHC-CF 100 K dataset, the results are shown in Fig. 5; parameter α and DTNM when items are recommended to users. The has slight influence on the results. In the simulations of this MAE RMSE MAE RMSE 154 Complex & Intelligent Systems (2020) 6:147–156 0.82 1.02 0.81 1.01 0.8 UCEC&CF 0.79 DEHC−CF DTNM 0.99 0.78 0.98 UCEC&CF 0.77 DEHC−CF DTNM 0.97 0.76 0.96 0.75 0.74 0.95 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 number of neighbors number of neighbors (a) MovieLens −100K (b) MovieLens −100K 0.83 1.03 0.82 1.02 0.81 1.01 0.8 1 UCEC&CF UCEC&CF DEHC−CF DEHC−CF 0.79 0.99 DTNM DTNM 0.78 0.98 0.77 0.97 0.76 0.96 0.75 0.95 0.74 0.94 0.73 0.93 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100 number of neighbors number of neighbors (c) MovieLens −1M (d) MovieLens −1M Fig. 6 MAE and RMSE of different methods on two datasets Table 5 Comparison results of MovieLens Algorithm No. of recommendations different algorithms on F (%) 2 4 6 8 10 12 100 K DEHC-CF 15.34 27.16 36.18 43.39 49.09 53.76 DTNM 15.46 27.29 36.38 43.58 49.32 54.01 UCEC&CF 15.57 27.43 36.59 43.80 49.60 54.15 1 M DEHC-CF 12.69 23.53 32.17 39.12 44.79 49.53 DTNM 12.55 23.19 31.68 38.49 44.04 48.71 UCEC&CF 12.82 23.74 32.42 39.40 45.11 49.89 The bolditalic values are corresponding optimal values MAE MAE RMSE RMSE Complex & Intelligent Systems (2020) 6:147–156 155 experiment results demonstrate that our method gains bet- 4. Sarwar BM, Karypis G, Konstan JA, Riedl J (2001) Item-based collaborative filtering recommendation algorithms. Www 1:285– ter performance than the others compared algorithms with different datasets. 5. Leicht EA, Holme P, Newman ME (2006) Vertex similarity in net- In a word, in section “Experiment and Studies”, two real works. Phys. Rev. E 73(2):026120 datasets with different size are considered to verify the effi- 6. Salton G, Mcgill MJ (1983) Introduction to modern information retrieval. McGraw-Hill, New York ciency of our proposed algorithm UCEC&CF. Five diversity 7. Breese JS, Heckerman D, Kadie C (2013) Empirical analysis of experiments are tested and UCEC&CF gains excellent rec- predictive algorithms for collaborative filtering. ACM 7(7):43–52 ommendation results. 8. Shardanand U, Maes P (1995) Social information filtering: algo- rithms for automating word of mouth. In Chi 95:210–217 9. Salton G (1989) Automatic text processing: the transformation, analysis. Addison-Wesley, Reading Conclusion 10. Pazzani MJ, Billsus D (2007) Content-based recommendation sys- tems. Springer-Verlag, New York, pp 325–341 11. Ren YB, Gong S J (2009) A collaborative filtering recommendation A new CF model based on user correlation and evolution- algorithm based on SVD smoothing. In: Third International Sym- ary clustering is proposed in this paper. The performance of posium on Intelligent Information Technology Application IEEE, our proposed method is verified by comparative experiments pp 530–532 from different aspects. On the one hand, only normalized and 12. Ba Q, Li X, Bai Z (2013) Clustering collaborative filtering recom- mendation system based on SVD algorithm. In: IEEE International reduced dimension score matrix is applied to generate clus- Conference on Software Engineering and Service Science, pp 963– tering principle. An improved dynamic evolutionary model is constructed to imitate the constantly changing states values 13. Zhang Y, Chen M, Huang D, Wu D, Li Y (2016) Idoctor: person- in the network. Similar users are divided into the same group. alized and professionalized medical recommendations based on hybrid matrix factorization. Future Gener Comput Syst 66:30–35 Besides, user correlation is proposed to measure the distance 14. Sarwar BM, Karypis G, Konstan J (2002) Recommender sys- between users by combining the users satisfaction and the tems for large-scale E-commerce: scalable. Confer Comput Inform potential score information. In each group, a novel user cor- Technol 1:291–324 relation measurement is adopted to find the highest similar 15. Haldar NAH, Khan FA, Ali A, Abbas H (2017) Arrhythmia classi- fication using mahalanobis distance based improved fuzzy c-means neighbors with target user. Finally, user-based collaborative clustering for mobile health monitoring systems. Neurocomputing filtering is adopted in each group. Extensive experiments 220:221–235 demonstrate our proposed method gains better recommen- 16. Wang Z, Wen Q, Sun Y, Zhang H (2012) A Fault Detection Scheme dation performance than the other compared algorithms. But Based on Self-Clustering Nodes Sets for Wireless Sensor Net- works. In: IEEE 12th International Conference on Computer and the work is still much room for improvement, we want to Information Technology, pp 921–925 join some other information of network into the design for 17. Birtolo C, Ronca D, Aurilio G (2012) Trust-aware clustering col- further performance enhancement. laborative filtering: identification of relevant items. Springer, Berlin Heidelberg, pp 374–384 Acknowledgements This research was supported through National 18. Huang C, Yin J (2010) Effective association clusters filtering to Natural Science Foundation of China (No. 71561020, 61503203, cold-start recommendations. In: Seventh international conference 61702317, 61771297); Fundamental Research Funds for the Central on fuzzy systems and knowledge discovery, IEEE, vol 5, pp 2461– Universities (No. GK201802013). 19. Wu J, Jiao Y (2014) Clustering dynamics of complex discrete- Open Access This article is distributed under the terms of the Creative time networks and its application in community detection. Chaos Commons Attribution 4.0 International License (http://creativecomm 24(3):268–4 ons.org/licenses/by/4.0/), which permits unrestricted use, distribution, 20. Wu J, Zhang L, Li Y, Jiao Y (2016) Partition signed social networks and reproduction in any medium, provided you give appropriate credit via clustering dynamics. Phys A 443:568–582 to the original author(s) and the source, provide a link to the Creative 21. Chen J, Wang H, Wang L, Liu W (2016) A dynamic evolutionary Commons license, and indicate if changes were made. clustering perspective: community detection in signed networks by reconstructing neighbor sets. Phys A 447:482–492 22. Liao CL, Lee SJ (2016) A clustering based approach to improv- ing the efficiency of collaborative filtering recommendation. Electr References Commer Res Appl 18:1–9 23. Shang F, Liu Y, Cheng J, Yan D (2017) Fuzzy double trace norm 1. Chen LJ, Zhang ZK, Liu JH, Gao J, Zhou T (2017) A vertex minimization for recommendation systems. IEEE Trans Fuzzy Syst similarity index for better personalized recommendation. Phys. A 99:1–1 466:607–615 24. Chen J, Uliji, Wang H (2018) Collaborative rating prediction based on dynamic evolutionary heterogeneous clustering. Swarm Evol 2. Singh R, Patra BK, Adhikari B (2015) A complex net- Comput 394–399 work approach for collaborative recommendation. arXiv preprint 25. Liji U, Chai Y, Chen J (2018) Improved personalized recommen- arXiv:1510.00585 dation based on user attributes clustering and score matrix filling. 3. Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J (1994) Comput Stand Interfaces 57:59–67 GroupLens: an open architecture for collaborative filtering of 26. Ding J, Jiao L, Wu J (2015) Prediction of missing links based on netnews. In: Proceedings of the ACM conference on Computer supported cooperative work. ACM, pp 175–186 multi-resolution community division. Phys A 417:76–85 123 156 Complex & Intelligent Systems (2020) 6:147–156 27. Wu J, Jiao L, Jin C, Liu F, Gong M (2012) Overlapping community 29. Pizzuti C (2008) GA-Net: a genetic algorithm for community detec- detection via network dynamics. Phys Rev E 85(2):016115 tion in social networks. Springer-Verlag, New York, pp 1081–1090 28. Lu L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A 390(6):1150–1170 Publisher’s Note Springer Nature remains neutral with regard to juris- dictional claims in published maps and institutional affiliations.

Journal

Complex & Intelligent SystemsSpringer Journals

Published: Apr 23, 2020

There are no references for this article.