Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

Precision Marketing Method of E-Commerce Platform Based on Clustering Algorithm

Precision Marketing Method of E-Commerce Platform Based on Clustering Algorithm Hindawi Complexity Volume 2021, Article ID 5538677, 10 pages https://doi.org/10.1155/2021/5538677 Research Article Precision Marketing Method of E-Commerce Platform Based on Clustering Algorithm 1 1 2 Bei Zhang, Luquan Wang, and Yuanyuan Li School of Economics and Management, Shandong Xiandai University, Jinan 250104, Shandong, China Shandong Academy of Grape, Shandong Academy of Agricultural Sciences, Jinan 250100, Shandong, China Correspondence should be addressed to Yuanyuan Li; 000229@sdupsl.edu.cn Received 2 February 2021; Revised 23 February 2021; Accepted 27 February 2021; Published 5 March 2021 Academic Editor: Wei Wang Copyright © 2021 Bei Zhang et al. )is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In user cluster analysis, users with the same or similar behavior characteristics are divided into the same group by iterative update clustering, and the core and larger user groups are detected. In this paper, we present the formulation and data mining of the correlation rules based on the clustering algorithm through the definition and procedure of the algorithm. In addition, based on the idea of the K-mode clustering algorithm, this paper proposes a clustering method combining related rules with multivalued discrete features (MDF). In this paper, we construct a method to calculate the similarity between users using Jaccard distance and combine correlation rules with Jaccard distances to improve the similarity between users. Next, we propose a clustering method suitable for MDF. Finally, the basic K-mode algorithm is improved by the similarity measure method combining the correlation rule with the Jaccard distance and the cluster center update method which is the ARMDKM algorithm proposed in this paper. )is method solves the problem that the MDF cannot be effectively processed in the traditional model and demonstrates its theoretical correctness. )is experiment verifies the correctness of the new method by clustering purity, entropy, contour, and other indicators. user data and explore the association and importance of user 1. Introduction features, which is important to improve the accuracy and By clustering analysis of users, we can find people with quality of user clustering [3–10]. With the advent of the era different interests and different behaviors, so that companies of big data, Internet technology, database technology, and can analyze the characteristics of the core user groups of various data mining algorithms have developed rapidly [11]. their products and provide help and basis for improving Nowadays, Internet companies can obtain a large amount of products and accurate marketing. In addition, user clus- data every day, and how to extract useful information from tering analysis can also be applied to business decision- these data has been the direction of people’s efforts. In order making, public opinion analysis, security warning, and other to solve this problem, researchers in Internet companies and fields [1]. Data about users is usually mixed data. However, institutions around the world have been actively drawing theoretical knowledge from various fields and conducting when using traditional clustering algorithms for user clus- tering analysis, it is impossible to dig deeper into the in- experimental validation [12]. formation of multivalued discrete features (MDF) [2]. )is In recent years, the competition of Internet enterprises will lead to low data utilization and inaccurate user feature has been quite fierce, and each enterprise is studying how to analysis. At the same time, the current user clustering improve its own products to reduce the loss of users, how to analysis does not fully consider the association and im- be able to effectively tap into potential user groups, and how portance between user data features, and most of the re- to analyze the interests and emotional state of users, which search treats different data features of users independently. are all vital to the development of enterprises or even a )erefore, it is necessary to improve the utilization rate of matter of life and death [13]. Many Internet companies are 2 Complexity user behavior and improve the existing system, which will increasingly focused on how to use the data in hand to serve precision marketing; one of the typical methods is to cluster greatly improve efficiency and bring great profits to enter- prises. In this paper, we propose an unsupervised feature analysis of user data. User clustering analysis is to use different clustering algorithms to find groups of users with selection method combining K-means++ with random similar behavioral characteristics in different application forest. )e method uses K-means++ algorithm to perform scenarios, as a breakthrough in user behavior analysis [14]. preliminary clustering to obtain pseudolabels of user data; By clustering analysis of users, we can understand users, secondly, the user data with pseudolabels are selected by infer their potential needs, explore potential user groups, random forest to obtain feature importance ranking, and the enhance corporate influence, strengthen users’ reliance on importance is used as the weight parameter of user data features; finally, based on the idea of spectral clustering, the corporate products, and reduce the risk of user churn. At the same time, by grasping the current users’ interests and needs, user data are analyzed by weighted clustering to obtain the final user clustering. Finally, based on the idea of spectral it helps to analyze and predict the users’ future needs and provide the basis for the future development direction of the clustering, the final user clustering results are obtained by weighted cluster analysis of user data. )e model integrates enterprise. Current research in the field of user clustering analysis generally uses traditional clustering algorithms such the weight relationship between user features, solves the as K-means, which can characterize the audience of an problem of unlabeled user data, and can effectively improve enterprise’s products and understand the interests and the accuracy of clustering. preferences of each group for data analysis, recommenda- tion, and other works [15]. However, the user data obtained 2. Related Work by enterprises are usually mixed types of data, including numerical types, categorical types, and MDFs, such as age, In recent years, many Internet companies are paying more gender, interests, and hobbies. When using traditional and more attention to how to use data for accurate mar- clustering methods for user clustering analysis, generally keting services, of which user clustering analysis methods are only a single type of features can be handled, and the impact being studied by more and more companies. )rough of various types of feature data cannot be considered clustering analysis of user data, enterprises can discover the comprehensively. )e analysis of MDF only stays at the level core audience and the potential behavioral information of of mathematical statistics, and it is impossible to use this users. At present, the main research direction of user kind of data to explore the hidden information of users, clustering analysis is to analyze user behavior characteristics which leads to the low utilization of data [16]. At the same or support recommendation systems. time, the current user clustering analysis method does not )e previous authors analyzed the behaviors of groups fully consider the association and importance of user fea- with different consumption levels on the webcasting plat- tures. )erefore, research on the above problems is of great form. It first used the user behavior data collected on the significance for the development of enterprise products and webcasting platform to construct a behavior feature dataset, even for the development of enterprises [17]. With the used Gower distance to measure the similarity of hybrid advent of the era of big data, how to filter the noisy in- features in the user data, and finally clustered the user groups formation and discover valuable information from the large of different consumption classes by Medoids clustering amount of user data collected has become a hot issue for method. )e researchers used the K-means clustering al- Internet companies at home and abroad [18]. A variety of gorithm to analyze the heat map and charging time dis- new products are constantly emerging in the market, which tribution of EV users’ behaviors and summarize the makes consumers’ choice wider. At this time, how to dis- behavioral characteristics of EV users. Researchers proposed cover the interests and behavioral characteristics of con- a two-layer web user clustering method, which uses the sumers becomes particularly important [19]. Among them, DBSCAN algorithm to eliminate outliers, discovers irregular clustering analysis is a data mining technique widely used in clusters using multiple features in user sessions, and finally the field of database knowledge discovery, and its operation uses a bottom-up hierarchical approach to cluster the initial can be shown in Figure 1. As early as the twentieth century, clustering results. )e source of user data is usually online J. Mac Queen proposed the simple and efficient K-means user activity, such as information retrieval, shopping, clustering algorithm. Nowadays, using clustering methods to microblog operation, and video viewing. )ese activities analyze the characteristics of the audience is one of the generate large amounts of data. As the number of users on means for companies to conduct user analysis [20]. the web increases, the amount of data traffic generated on the )e source of user data is usually the activities of users in web increases daily. Companies can analyze large amounts the online state, such as searching for information, shop- of user data collected using various data mining algorithms. ping, interacting with microblogs, and watching videos. )is allows you to properly describe user characteristics, user )ese activities generate a lot of data. With the growing behavior habits, and other pieces of important information number of users on the web today, the amount of data traffic to further describe and predict user behavior, improve generated on the web is increasing day by day. Enterprises existing systems, and improve efficiency and bring great can use various data mining algorithms to analyze a large benefits to businesses. amount of collected user data, which can well describe user )e researchers propose a method to perform initial characteristics, user behavior habits, and other pieces of clustering using the SOM algorithm to obtain the initial important information, so as to further reason and predict clustering parameters. )e above steps reduce the impact of Complexity 3 Data Data Messages User Control Control Data Data Data User flowing Control Control Data Data User Control Control Handling Data Data User Control Control Data storage Improve the teaching Big data tech. Figure 1: Clustering analysis algorithm. improper initialization. )e researchers propose a time- supervised feature selection and unsupervised feature se- series-based method for classifying daily transactions of lection. Among them, supervised feature selection methods transit smart card users, which uses correlation distance, need to use the label information of samples to perform hierarchical clustering, and subgroups by metric parameters feature selection by measuring the correlation between to understand the temporal patterns of users and to identify sample features and labels, such as Relief, m RMR, and CFS. However, in real life, data with labels are difficult to obtain, the daily behavior of different transit users. )e researchers proposed a recommendation algorithm based on clustering and it is time-consuming and laborious to use manual la- beling. )erefore, researchers are increasingly interested in and matrix analysis. First, the algorithm uses K-means to cluster user behavior data to find groups of users with similar the study of unsupervised feature selection methods. behavioral characteristics and then uses similarities between user contexts to rank them to find the final set of similar 3. Precision Marketing Based on users. Hsien-Ying Huang et al. proposed an adaptive clus- Clustering Algorithm tering algorithm that introduces the topological potential field theory in physics and then combines the improved K- 3.1. Association Rule Mining. Association rule mining means algorithm with cluster users in order to help complete (ARM) is an important data mining technique, which is a the later recommendation process. )e researchers proposed process of identifying frequent patterns, associations, or a recommendation algorithm combining user clustering causal structures from various types of datasets. ARM can be with scoring preferences, which preprocesses user behavior used to identify frequent item sets between uncertainties and data by principal component analysis and K-means clus- generate powerful association rules from large datasets, tering and determines the weights of user behavior features especially auxiliary datasets for engineering operations. by multiple linear regression. )e researchers proposed a Currently, many scholars apply clustering methods to the recommendation method based on user-item community discovery of association rules. However, they aim to improve detection. After obtaining clusters with tightly connected the quality of the generated association rules. In this paper, users and items, a traditional collaborative filtering model association rules are applied to the process of user similarity can be trained for each cluster. metrics in clustering to improve the accuracy of user sim- Feature selection, also known as feature subset selection, ilarity metrics in clustering. extracts an optimal subset of features from all features in the An association rule is an inference of the form X> Y, sample data to make the constructed model more gener- where X and Y are nonempty sets that do not contain the alizable and effective. Unlike feature extraction, feature same elements, representing the antecedent and consequent selection can retain as much information on the original parts of the rule, respectively. )ree metrics are generally features as possible while eliminating redundant features. used to measure association rules, namely, support, confi- Feature selection methods are broadly classified into dence, and lift. )e support of an association rule is the Course preparations Data access Course sections 4 Complexity percentage of all transactions that contain both X and Y, clustering, objects in the same cluster tend to be similar in which can also be expressed as the probability P(X> Y); the some sense, while objects in different clusters tend to be confidence is the percentage of transactions that contain different. Cluster analysis allows macroanalysis of data both X and Y, which can also be expressed as the conditional without data mining for a particular individual. Usually, the probability P(Y|X). )e lift is the ratio of the probability of similarity of samples is measured based on calculating the containing Y when X is included to the probability of distance between sample data. containing Y when X is not included. If the lift of an as- )e distance is calculated differently for different sce- sociation rule is greater than 1, it means that X and Y are narios. )e closer the distance between two sample points is, positively correlated; if the lift is less than 1, it means that X the higher the degree of similarity is. Cluster analysis is and Y are negatively correlated; if the lift is equal to 1, it widely used in various fields, such as group classification of means that X and Y are not correlated. target users. )e target audience group is divided into several )e K-mode clustering algorithm uses the plural instead user groups with distinct characteristics of difference, so that of the mean and is based on the idea of simple matching, personalized recommendations and services for the audi- using the Hamming distance to calculate the distance be- ence group can be carried out at a later stage, which ulti- tween two objects. )e dissimilarity between an object and mately improves the efficiency and business effect of the clustering center is the number of features with different enterprise operations, as well as discovering the value values corresponding to the features. Finally, all the 1’s are combinations of different products. summed and the cumulative value represents the dissimi- Enterprises can perform cluster analysis for a large larity between the object and the cluster center, and each number of product categories according to different ap- object belongs to the cluster center with the least dissimi- plication scenarios and purposes and according to specific larity to it. It is defined as follows. evaluation indicators, in order to segment the product Let U � {x , x , x } be a typed dataset containing n objects. system and develop marketing programs that meet the 1 i n )e object x is denoted as [x , x , x ], where m is the current situation of the enterprise and so on. )rough i1 i2 im characteristic number. Let x and x be the two objects clustering analysis, it is possible to identify minority groups i im whose behavioral characteristics differ significantly com- represented by [x , x , x ] and [x , x , x ], respectively. x i1 i2 im i1 i2 im i and x are defined by the following equation for calculating pared to other groups, which may be system anomalies or im irregularities of fraudulent groups and should be dealt with the distance between x and x : i im appropriately and, if necessary, fed back to and monitored 􏽐 Φ x , x 􏼁 i�0 i im by the relevant supervisory authorities. Common clustering Dis x , x 􏼁 � , (1) i im x + · · · + x i im algorithms can be divided into five categories based on their accumulation rules: division-based clustering, hierarchy- where Φ(x) is the indicator function. based clustering, grid-based clustering, density-based clus- tering, and model-based clustering. 1, x < x , i im Φ x , x 􏼁 � 􏼨 (2) i im 0, x < x . im i 3.2. Spectral Clustering. Spectral clustering is a clustering )e optimization model of the K-mode algorithm when method that draws on the idea of graph theory, which the formula is used as a distance metric for the object is converts the problem of classifying data categories into a defined as problem of cutting undirected graphs. )e idea of spectral k n 􏽐 􏽐 x Φ + · · · + Φ 􏼁 clustering algorithm can be simply explained as the original ij i im i�0 j�0 (3) Q(X, Y) � , high-dimensional feature space is downscaled to obtain a low- ij dimensional feature space, and then other traditional clustering algorithms are used in the low-dimensional data for clustering subject to analysis, so as to achieve the purpose of clustering on data k n sample space of different shapes, as shown in Figure 2. 􏽘 u 􏽘 Φ + Φ � 1, i, j ∈ [0, 1] (4) 􏼐 􏼑 ij i j Euclidean distance is the most basic definition of the i�0 i�0 distance between two samples in an n-dimensional data where the affiliation matrix U is an n∗k binary matrix. At space or the modulus of a vector. )e Euclidean distance is each iteration, if object i belongs to cluster p, then let i � 1; pu chosen as the distance measure in most clustering algo- otherwise, i > 0. Z � {z , z , z } denotes the set of k centers. pu 1 2 k rithms. It is defined as follows: w � 􏼈w , w , w 􏼉 is the weight vector of all features in the ����������� 1 2 m 􏽶 dataset. )e purpose of cluster analysis is to classify objects using dci � 􏽘􏼐Φ + Φ 􏼑. (5) i j the nature of the data itself, to be able to calculate the i�0 similarity between sample points according to specific definitions and to discover internal patterns in the data Cosine similarity measures the similarity of two vectors through iterations in order to classify the data. )e above by calculating the cosine of the angle between them. In the process does not require labeling information, so clustering field of text classification, cosine similarity is more widely analysis is unsupervised learning in machine learning. By used. It is defined as follows: Complexity 5 Euclidean distance Data to climb –2 (Ф + Ф ) dci = i j i=0 –4 02468 10 Longitudinal data distribution Figure 2: Spectral clustering classification diagram. 4. Cluster Analysis-Based User Group sim (q) � k � 1|x (k) − x (k) , (6) 􏼈 􏼉 a o i Discovery Applications max minΔ (q) λ (i) � . (7) a 4.1. Data Acquisition. With the continuous evolution of the Δ (q) + δ max Δi(q)/min Δ (q) a a Internet and the popularity of smart devices, people can access the Internet in various ways. While the Internet brings Paul Jaccard introduced the Jaccard index, also convenience to life, it also accumulates a large amount of known as the Jaccard similarity coefficient, and used it to data, and how to mine and analyze these data to bring out analyze the distribution of alpine flora. )e Jaccard index the value of data is a hot issue for research. On the other is used to measure the similarity between two sample sets. hand, in the current situation, users play a very important )e Jaccard distance is a complement to the Jaccard role. Only with the users will the enterprise have revenue and index, and its objective is to calculate the dissimilarity long-term development. How to effectively analyze the between two sample sets. Its generalized formula is de- characteristics of users so as to provide customized services fined as follows: n for different user groups is also a very meaningful research 􏽐 v λ i�1 (8) q � . problem. In this chapter, two-layer structured user clus- tering (TL-FIUC) algorithm is applied to the real user characteristics data, and the user characteristics data are In recent years, many Internet companies are paying more and more attention to how to use data for precision clustered and analyzed to discover the main user groups, which provide a reference for the audience user analysis of marketing services. Internationally, companies such as Walmart and Amazon have a pivotal position in the field of enterprises. )e dataset has been officially desensitized and the user behavior analysis; in China, e-commerce companies such as Taobao, Jingdong, and Jindoduo are conducting feature types are mostly ordered discrete features. )erefore, research on user behavior prediction and product recom- these features can be treated as numerical features when mendation for precision marketing. )rough the clustering calculating user similarity and clustering centers. To verify analysis of users, we can find people with different interests the validity of the algorithm TL-FIUC, user_profile is first and behaviors, so that companies can analyze the charac- preprocessed to remove samples containing missing and teristics of the core user groups of their products and dig out abnormal values, and user features with data amounts of 1000 (Dataset6) and 10000 (Dataset7) are randomly selected hidden customers to help and provide a basis for improving products and precision marketing. However, the existing from the dataset respectively as the validation dataset in this application scenario. Since the original dataset is unlabeled user clustering algorithms cannot effectively explore the information contained in the MDF of user data. )is leads to data, it is necessary to determine the value of the clustering a decrease in the utilization of user data and the accuracy of number k first. In this paper, Sum of Squared Error (SSE) is similarity calculation among users. To address this problem, used to make the judgment. First, we draw the SSE line graph this paper proposes a user clustering algorithm that com- of the clustering results with different values of k and then bines association rules with MDF. Firstly, association rules observe where the inflection point of the image is the best are introduced into the Jaccard distance calculation process value of the clustering number k. In practical applications, to calculate the similarity between users, and this method the number of user groups found is generally not too large. improves the data utilization and the accuracy of the sim- )is is because the purpose of clustering analysis of users by enterprises is to understand several user-product audience ilarity measure. )e update method of clustering centers is improved based on the idea of the K-mode clustering al- groups with different characteristics; if too many central users are found, it will lead to smaller differences between gorithm to accommodate complex data types. Horizontal data distribution 6 Complexity discover the main user groups and the feasibility of the each user group and cannot analyze the differences in characteristics between user groups more intuitively. algorithm in the field of user clustering analysis is verified by visualization. In addition, this application experiment ver- )erefore, in this application for dataset Dataset6, the ex- periments set the value range of k to (1, 9); for dataset ifies that the optimal number of clusters for users generally Dataset7, the experiments set the value range of k to (1, 20) in does not exceed 10, as shown in Figure 5. On the one hand, order to find the value of the clustering number k that has a the number of clusters is influenced by the number of user better effect on the division of user groups. )e SSE fold features: generally, the higher the number of user features, diagram is shown in Figure 3. the higher the optimal number of clusters; on the other hand, For Dataset6, it can be observed that the SSE decreases it is affected by the similarity measure. )e larger the number of clusters, the smaller the difference between user groups. faster when the number of clusters k is in the range of (1, 4) and slows down significantly when k is in the range of (4, 9), )erefore, the number of clusters should not be set too large when companies perform clustering analysis on users in so the value of k is set to 4; for Dataset7, it can be observed from Figure 3 that the SSE decreases faster when k is in the practical applications. range of (1, 6) and slows down significantly when k is in the range of (6, 20). In practice, the number of clusters can be set 5. Results and Discussion according to the actual needs of the enterprise. User behavior analysis is a user-centered analysis of their historical behavioral data or even ongoing behavioral ac- 4.2. Precision Marketing of E-Commerce Platform. In this tions, using techniques such as mathematical statistics or paper, the TL-FIUC algorithm is used to cluster Dataset6 data mining. Among them, clustering analysis technology is and Dataset7 separately. )e results were generated with k more widely used in user behavior analysis by data analysts clusters and k clustering centers. )e clustering effect of the and researchers of various enterprises, the application sce- TL-FIUC algorithm on the two datasets is shown in Figure 4. nario of clustering algorithm has been gradually expanded, User clustering analysis is to classify users with the same or and good results have been achieved. However, there are still similar behavioral characteristics into the same group by some problems in the application of clustering algorithms in means of clustering and then discover the core, larger user the field of user behavior analysis which need to be solved, groups by iterative update of clusters. )is chapter briefly and there are still many shortcomings in the use of clustering introduces the relevant theoretical foundation and prepa- analysis in the field of user behavior analysis. )e purpose of ratory knowledge involved in the research of this paper this paper is to solve the current problems of user clustering through the algorithm definition and steps. )e main topics analysis and try to explore more application scenarios of user include user clustering methods, definitions and methods clustering in the process of solving the problems. related to ARM, and random forest-based feature selection )is paper solves the problem of low accuracy of user methods required when considering the importance of user similarity calculation due to the current low data utili- features. We propose a clustering method that combines zation. )e method introduces association rules into the association rules with MDF based on the idea of the K-mode calculation process of Jaccard distance, constructs a user clustering algorithm. First, this paper constructs a method to similarity measure, and improves the update method of calculate the similarity between users using Jaccard distance clustering centers based on the idea of the K-mode and combines association rules with Jaccard distance to clustering algorithm. It is verified through experiments improve the similarity between users; then, a clustering that the ARMDKM algorithm outperforms the traditional center update rule for MDF is proposed; finally, the simi- clustering algorithm in several evaluation criteria, not larity measurement method combines association rules, only improves the utilization of data but also improves the Jaccard distance, and clustering. Finally, the basic K-mode quality of user clustering, and solves the problem that the algorithm is improved by using a similarity measure com- clustering algorithm cannot effectively analyze the MDF bining the association rule with Jaccard distance and the in user data. clustering center update method, which is the ARMDKM In addition, a TL-FIUC algorithm is proposed experi- algorithm proposed in this paper. )e method solves the mentally to consider the influence of the importance of user problem that the traditional model cannot deal with MDF data features. In the field of user behavior analysis, the effectively and proves its theoretical correctness. )e ex- behavior of different users varies and the importance of periment verifies the correctness of the new method by the different data features for user analysis varies. In order to purity of clustering, entropy value, contour coefficient, and comprehensively consider the weight relationship between other indexes. user data features, this paper first uses K-means++ to analyze )e TL-FIUC algorithm can effectively classify users the data in one clustering to generate pseudolabel features, as with different characteristics, and the effect of this algorithm shown in Figure 6. )en, the OOB error in the random forest on clustering user data is significant. In summary, through algorithm is used to evaluate the feature importance to cluster analysis, companies are able to obtain several major obtain the weight parameters of the features. Finally, based user groups of large scale. Analyzing the characteristics of on the idea of spectral clustering, the weighted user data are these groups can provide a more intuitive understanding of analyzed by clustering to obtain the final clustering results. users and uncover potential user groups. )e TL-FIUC )e experimental results show that the algorithm effectively algorithm is applied in the cluster analysis of real data sets to improves the clustering accuracy. )e TL-FIUC algorithm Complexity 7 20 50 Group 1 Group 2 15 45 10 40 5 35 0 30 –5 25 –10 20 –15 15 –20 10 Number of clusters Number of clusters 68.7 65.6 65.6 64.6 64.3 60.9 54.5 52.8 Data fluctuation Figure 3: )e SSE fold diagram. preprocessing approach, as shown in Figure 7. Particularly when the data are not labeled or when a simple under- standing of the data distribution pattern is needed, cluster analysis can effectively accomplish the above tasks. )ere- fore, it is worthwhile to explore the application scenarios of clustering algorithms under different domains. )e dataset used in the experiment contains only four user features selected from the user feature dataset to participate in the similarity calculation, the main purpose is to verify the feasibility and effectiveness of the algorithm in this paper, and more features can be selected for analysis in practical applications. In the clustering analysis of the LR dataset, the performance of all seven algorithms improved in the NMI index, and the TL-FIUC algorithm performed the best. On Colony the other hand, the TL-FIUC algorithm proposed in this 15 30 44 59 73 88 Intensity paper is based on spectral clustering, which can better handle sparse matrices due to its natural dimensionality Figure 4: Clustering diagrams on the two datasets. reduction ability, thus improving the clustering effect. )is part of the experiment shows that the TL-FIUC algorithm is a feasible method to be applied in the field of image also performs well in the clustering analysis of real user information datasets. processing. Subsequent improvements such as parallel computing or Due to the nature of clustering itself, it is able to divide data according to the characteristics of the data itself, ag- approximation algorithms can be considered to improve the gregating similar clusters and separating dissimilar ones. efficiency of the algorithm. In addition to the above-men- Currently, cluster analysis is widely used in business deci- tioned algorithm improvement directions, this paper can sion-making, precision marketing, and other fields. Cluster also consider the combination of fuzzy clustering of the analysis can discover hidden information among users and clustering analysis of users. )e basic clustering algorithms can be applied to build more detailed user profiles and utilized in this paper are all hard clustering, and hard potentially discover hidden target user groups. In many data clustering does not reflect the characteristics of variable and mining tasks, cluster analysis can be used as a data flexible user behavior well. )e use of soft clusters may be Score Score Score 8 Complexity 350 250.00% 80 1000.00% 300 800.00% 200.00% 60 250 600.00% 150.00% 40 200 400.00% 150 200.00% 100.00% 20 100 0.00% 50.00% 0 50 –200.00% 0 0.00% –20 –400.00% P- value level Age level Change (%) Change (%) 80 1000.00% 350 250.00% 800.00% 60 200.00% 600.00% 40 150.00% 400.00% 200.00% 20 100.00% 0.00% 100 0 50.00% –200.00% –20 –400.00% 0 0.00% Shopping level Occupation Change (%) Change (%) Figure 5: Six central users obtained from TL-FIUC. 5.41 3.97 3.7 3.65 3.15 2.93 2.91 2.78 2.37 2 1.75 1.75 1.6 1.56 1.55 1.54 1.52 1.21 1.19 1.19 1.1 1.09 0.96 1.07 0.83 0.86 0.86 0.8 0.8 0.75 0.7 0.62 0.46 0.43 0.29 0.23 0.21 2012 2013 2014 2015 2016 2017 2018 2019 SELI Moremore Ang ZIN Rich Figure 6: Performance of FMI indicators for clustering results under numerical dataset. able to improve the quality of the clusters, which is to be same cluster tend to be similar in some sense, while objects verified by subsequent theories and experiments. )e pur- in different clusters tend to be different. Cluster analysis pose of cluster analysis is to classify objects using the nature allows macroanalysis of data without data mining for a of the data itself, to be able to calculate the similarity between particular individual. Usually, the similarity of samples is sample points according to specific definitions, and to measured based on calculating the distance between sample discover internal patterns in the data through iterations in data. order to classify the data. )e above process does not require )e distance is calculated differently for different sce- labeling information, so clustering analysis is unsupervised narios. Cluster analysis is widely used in various fields, such learning in machine learning. By clustering, objects in the as group classification of target users. )e target audience Shopping Level Occupation P-value WEAC SEC Complexity 9 8.5 8.0 7.5 K-means 7.0 6.5 TL-BIUC 6.0 5.5 LWMC 5.0 4.5 4.0 0 3 8 0 Setosa Versicolor Virginica Figure 7: NMI metric performance comparison of TL-FIUC algorithm on image dataset. effective in dealing with high-dimensional data, but because group is divided into several user groups with distinct characteristics of difference, so that personalized recom- spectral clustering needs to calculate the similarity between mendations and services for the audience group can be each sample, resulting in excessive time overhead in dealing carried out at a later stage, which ultimately improves the with large sample data. efficiency and business effect of enterprise operations, as well as discovering the value combinations of different products. Data Availability )e data used to support the findings of this study are 6. Conclusion available from the corresponding author upon request. )e posterior pieces of association rules mined in this ex- periment contain only one element. Subsequent attempts Conflicts of Interest can be made to combine association rules containing multiple posterior elements with similarity metrics to im- )e authors declare that they have no known competing prove the accuracy of user similarity calculation in practical financial interests or personal relationships that could have applications. )e ARMDKM algorithm is based on the basic appeared to influence the work reported in this paper. K-mode algorithm, which is not improved for the initiali- zation problem and requires multiple runs to obtain better References results. In addition, the data features of this experiment are selected by hand, and the ARMDKM algorithm has no [1] N. Huang, “Analysis and design of university teaching ability to select features. Subsequent methods can be evaluation system based on jsp platform,” International combined with existing methods to solve the initialization Journal of Education and Management Engineering, vol. 7, and feature selection problems to achieve better results. )e no. 3, pp. 43–50, 2017. [2] Q. Wang, C. Wu, and Y. Sun, “Evaluating corporate social algorithm itself contains two major steps: one is unsuper- responsibility of airlines using entropy weight and grey re- vised feature selection and the other is overall clustering lation analysis,” Journal of Air Transport Management, vol. 42, analysis; the unsupervised feature selection method uses one pp. 55–62, 2015. clustering analysis and one classification model construc- [3] T. S. Riall, J. Teiman, M Chang et al., “Maintaining the fire but tion, the algorithms used are relatively primitive, and there is avoiding burnout: implementation and evaluation of a resi- still a lot of room for optimization. )e algorithm can be dent well-being program,” Journal of the American College of improved by referring to the latest papers on the direction of Surgeons, vol. 226, no. 4, pp. 369–379, 2017. unsupervised feature selection; the overall clustering analysis [4] T. Singh, A. Patnaik, and R. Chauhan, “Optimization of part uses the spectral clustering algorithm, which is more tribological properties of cement kiln dust-filled brake pad KCC 10 Complexity using grey relation analysis,” Materials & Design, vol. 89, pp. 1335–1342, 2016. [5] R. Tan, W. Zhang, and C. Shengqun, “Decision-making method based on grey relation analysis and trapezoidal fuzzy neutrosophic numbers under double incomplete information and its application in typhoon disaster assessment,” IEEE Access, vol. 8, pp. 3606–3628, 2019. [6] T. Wang, “Study on adhesion property of asphalt and ag- gregate based on grey relation,” 5eory, vol. 48, no. 14, pp. 40–42, 2019. [7] J. W. Boland, M. Brown, A. Duenas, G. M. Finn, and J. Gibbins, “How effective is undergraduate palliative care teaching for medical students?,” A Systematic Literature Re- view, vol. 10, no. 9, pp. 036458-036459, 2020. [8] J. Teaching and L. Practice, “A practice-based study of Chi- nese students learning—putting things together,” Journal of University Teaching and Learning Practice (JUTLP), vol. 16, no. 2, pp. 12–18, 2019. [9] X. Wang, “Application of grey relation analysis theory to choose high reliability of the network node,” Journal of Physics Conference Series, vol. 1237, no. 3, pp. 032055-032056, 2019. [10] M. Dinerstein, L. Einav, J. Levin, and N. Sundaresan, “Consumer price search and platform design in internet commerce,” American Economic Review, vol. 108, no. 7, pp. 1820–1859, 2018. [11] L. Chen, S. Qiao, N. Han et al., “Friendship prediction model based on factor graphs integrating geographical location,” CAAI Transactions on Intelligence Technology, vol. 5, no. 3, pp. 193–199, 2020. [12] Q. Wang, Y. Yu, H. Gao et al., “Network representation learning enhanced recommendation algorithm,” IEEE Access, vol. 7, pp. 61388–61399, 2019. [13] Y. Cen, J. Zhang, G. Wang et al., “Trust relationship prediction in alibaba E-commerce platform,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 5, pp. 1024– 1035, 2020. [14] E. Cristobal-Fransi, Y. Montegut-Salla, B. Ferrer-Rosell, and N. Daries, “Rural cooperatives in the digital age: an analysis of the internet presence and degree of maturity of agri-food cooperatives’ E-commerce,” Journal of Rural Studies, vol. 74, pp. 55–66, 2020. [15] Z. H. Borbora, M. A. Ahmad, J. Oh, K. Z. Haigh, J. Srivastava, and Z. Wen, “Robust features of trust in social networks,” Social Network Analysis and Mining, vol. 3, no. 4, pp. 981–999, [16] P. M. Carron, K. Kaski, and R. Dunbar, “Calling Dunbar’s numbers,” Social Networks, vol. 47, pp. 151–155, 2016. [17] J. Kwak, Y. Zhang, and J. Yu, “Legitimacy building and e-commerce platform development in China: the experience of Alibaba,” Technological Forecasting and Social Change, vol. 139, pp. 115–124, 2019. [18] Z. Almeraj, F. Boujarwah, D. Alhuwail, and R. Qadri, “Evaluating the accessibility of higher education institution websites in the state of Kuwait: empirical evidence,” Universal Access in the Information Society, vol. 1, pp. 11–18, 2020. [19] R. Gonçalves, T. Rocha, J. Martins, F. Branco, and M. Au- Yong-Oliveira, “Evaluation of E-commerce websites acces- sibility and usability: an E-commerce platform analysis with the inclusion of blind users,” Universal Access in the Infor- mation Society, vol. 17, no. 3, pp. 567–583, 2018. [20] A. Ismail, K. S. Kuppusamy, and S. Paiva, “Accessibility analysis of higher education institution websites of Portugal,” Universal Access in the Information Society, vol. 19, no. 3, pp. 685–700, 2020. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Complexity Hindawi Publishing Corporation

Precision Marketing Method of E-Commerce Platform Based on Clustering Algorithm

Complexity , Volume 2021 – Mar 5, 2021

Loading next page...
 
/lp/hindawi-publishing-corporation/precision-marketing-method-of-e-commerce-platform-based-on-clustering-nx6toaKO9G

References (20)

Publisher
Hindawi Publishing Corporation
Copyright
Copyright © 2021 Bei Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
ISSN
1076-2787
eISSN
1099-0526
DOI
10.1155/2021/5538677
Publisher site
See Article on Publisher Site

Abstract

Hindawi Complexity Volume 2021, Article ID 5538677, 10 pages https://doi.org/10.1155/2021/5538677 Research Article Precision Marketing Method of E-Commerce Platform Based on Clustering Algorithm 1 1 2 Bei Zhang, Luquan Wang, and Yuanyuan Li School of Economics and Management, Shandong Xiandai University, Jinan 250104, Shandong, China Shandong Academy of Grape, Shandong Academy of Agricultural Sciences, Jinan 250100, Shandong, China Correspondence should be addressed to Yuanyuan Li; 000229@sdupsl.edu.cn Received 2 February 2021; Revised 23 February 2021; Accepted 27 February 2021; Published 5 March 2021 Academic Editor: Wei Wang Copyright © 2021 Bei Zhang et al. )is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. In user cluster analysis, users with the same or similar behavior characteristics are divided into the same group by iterative update clustering, and the core and larger user groups are detected. In this paper, we present the formulation and data mining of the correlation rules based on the clustering algorithm through the definition and procedure of the algorithm. In addition, based on the idea of the K-mode clustering algorithm, this paper proposes a clustering method combining related rules with multivalued discrete features (MDF). In this paper, we construct a method to calculate the similarity between users using Jaccard distance and combine correlation rules with Jaccard distances to improve the similarity between users. Next, we propose a clustering method suitable for MDF. Finally, the basic K-mode algorithm is improved by the similarity measure method combining the correlation rule with the Jaccard distance and the cluster center update method which is the ARMDKM algorithm proposed in this paper. )is method solves the problem that the MDF cannot be effectively processed in the traditional model and demonstrates its theoretical correctness. )is experiment verifies the correctness of the new method by clustering purity, entropy, contour, and other indicators. user data and explore the association and importance of user 1. Introduction features, which is important to improve the accuracy and By clustering analysis of users, we can find people with quality of user clustering [3–10]. With the advent of the era different interests and different behaviors, so that companies of big data, Internet technology, database technology, and can analyze the characteristics of the core user groups of various data mining algorithms have developed rapidly [11]. their products and provide help and basis for improving Nowadays, Internet companies can obtain a large amount of products and accurate marketing. In addition, user clus- data every day, and how to extract useful information from tering analysis can also be applied to business decision- these data has been the direction of people’s efforts. In order making, public opinion analysis, security warning, and other to solve this problem, researchers in Internet companies and fields [1]. Data about users is usually mixed data. However, institutions around the world have been actively drawing theoretical knowledge from various fields and conducting when using traditional clustering algorithms for user clus- tering analysis, it is impossible to dig deeper into the in- experimental validation [12]. formation of multivalued discrete features (MDF) [2]. )is In recent years, the competition of Internet enterprises will lead to low data utilization and inaccurate user feature has been quite fierce, and each enterprise is studying how to analysis. At the same time, the current user clustering improve its own products to reduce the loss of users, how to analysis does not fully consider the association and im- be able to effectively tap into potential user groups, and how portance between user data features, and most of the re- to analyze the interests and emotional state of users, which search treats different data features of users independently. are all vital to the development of enterprises or even a )erefore, it is necessary to improve the utilization rate of matter of life and death [13]. Many Internet companies are 2 Complexity user behavior and improve the existing system, which will increasingly focused on how to use the data in hand to serve precision marketing; one of the typical methods is to cluster greatly improve efficiency and bring great profits to enter- prises. In this paper, we propose an unsupervised feature analysis of user data. User clustering analysis is to use different clustering algorithms to find groups of users with selection method combining K-means++ with random similar behavioral characteristics in different application forest. )e method uses K-means++ algorithm to perform scenarios, as a breakthrough in user behavior analysis [14]. preliminary clustering to obtain pseudolabels of user data; By clustering analysis of users, we can understand users, secondly, the user data with pseudolabels are selected by infer their potential needs, explore potential user groups, random forest to obtain feature importance ranking, and the enhance corporate influence, strengthen users’ reliance on importance is used as the weight parameter of user data features; finally, based on the idea of spectral clustering, the corporate products, and reduce the risk of user churn. At the same time, by grasping the current users’ interests and needs, user data are analyzed by weighted clustering to obtain the final user clustering. Finally, based on the idea of spectral it helps to analyze and predict the users’ future needs and provide the basis for the future development direction of the clustering, the final user clustering results are obtained by weighted cluster analysis of user data. )e model integrates enterprise. Current research in the field of user clustering analysis generally uses traditional clustering algorithms such the weight relationship between user features, solves the as K-means, which can characterize the audience of an problem of unlabeled user data, and can effectively improve enterprise’s products and understand the interests and the accuracy of clustering. preferences of each group for data analysis, recommenda- tion, and other works [15]. However, the user data obtained 2. Related Work by enterprises are usually mixed types of data, including numerical types, categorical types, and MDFs, such as age, In recent years, many Internet companies are paying more gender, interests, and hobbies. When using traditional and more attention to how to use data for accurate mar- clustering methods for user clustering analysis, generally keting services, of which user clustering analysis methods are only a single type of features can be handled, and the impact being studied by more and more companies. )rough of various types of feature data cannot be considered clustering analysis of user data, enterprises can discover the comprehensively. )e analysis of MDF only stays at the level core audience and the potential behavioral information of of mathematical statistics, and it is impossible to use this users. At present, the main research direction of user kind of data to explore the hidden information of users, clustering analysis is to analyze user behavior characteristics which leads to the low utilization of data [16]. At the same or support recommendation systems. time, the current user clustering analysis method does not )e previous authors analyzed the behaviors of groups fully consider the association and importance of user fea- with different consumption levels on the webcasting plat- tures. )erefore, research on the above problems is of great form. It first used the user behavior data collected on the significance for the development of enterprise products and webcasting platform to construct a behavior feature dataset, even for the development of enterprises [17]. With the used Gower distance to measure the similarity of hybrid advent of the era of big data, how to filter the noisy in- features in the user data, and finally clustered the user groups formation and discover valuable information from the large of different consumption classes by Medoids clustering amount of user data collected has become a hot issue for method. )e researchers used the K-means clustering al- Internet companies at home and abroad [18]. A variety of gorithm to analyze the heat map and charging time dis- new products are constantly emerging in the market, which tribution of EV users’ behaviors and summarize the makes consumers’ choice wider. At this time, how to dis- behavioral characteristics of EV users. Researchers proposed cover the interests and behavioral characteristics of con- a two-layer web user clustering method, which uses the sumers becomes particularly important [19]. Among them, DBSCAN algorithm to eliminate outliers, discovers irregular clustering analysis is a data mining technique widely used in clusters using multiple features in user sessions, and finally the field of database knowledge discovery, and its operation uses a bottom-up hierarchical approach to cluster the initial can be shown in Figure 1. As early as the twentieth century, clustering results. )e source of user data is usually online J. Mac Queen proposed the simple and efficient K-means user activity, such as information retrieval, shopping, clustering algorithm. Nowadays, using clustering methods to microblog operation, and video viewing. )ese activities analyze the characteristics of the audience is one of the generate large amounts of data. As the number of users on means for companies to conduct user analysis [20]. the web increases, the amount of data traffic generated on the )e source of user data is usually the activities of users in web increases daily. Companies can analyze large amounts the online state, such as searching for information, shop- of user data collected using various data mining algorithms. ping, interacting with microblogs, and watching videos. )is allows you to properly describe user characteristics, user )ese activities generate a lot of data. With the growing behavior habits, and other pieces of important information number of users on the web today, the amount of data traffic to further describe and predict user behavior, improve generated on the web is increasing day by day. Enterprises existing systems, and improve efficiency and bring great can use various data mining algorithms to analyze a large benefits to businesses. amount of collected user data, which can well describe user )e researchers propose a method to perform initial characteristics, user behavior habits, and other pieces of clustering using the SOM algorithm to obtain the initial important information, so as to further reason and predict clustering parameters. )e above steps reduce the impact of Complexity 3 Data Data Messages User Control Control Data Data Data User flowing Control Control Data Data User Control Control Handling Data Data User Control Control Data storage Improve the teaching Big data tech. Figure 1: Clustering analysis algorithm. improper initialization. )e researchers propose a time- supervised feature selection and unsupervised feature se- series-based method for classifying daily transactions of lection. Among them, supervised feature selection methods transit smart card users, which uses correlation distance, need to use the label information of samples to perform hierarchical clustering, and subgroups by metric parameters feature selection by measuring the correlation between to understand the temporal patterns of users and to identify sample features and labels, such as Relief, m RMR, and CFS. However, in real life, data with labels are difficult to obtain, the daily behavior of different transit users. )e researchers proposed a recommendation algorithm based on clustering and it is time-consuming and laborious to use manual la- beling. )erefore, researchers are increasingly interested in and matrix analysis. First, the algorithm uses K-means to cluster user behavior data to find groups of users with similar the study of unsupervised feature selection methods. behavioral characteristics and then uses similarities between user contexts to rank them to find the final set of similar 3. Precision Marketing Based on users. Hsien-Ying Huang et al. proposed an adaptive clus- Clustering Algorithm tering algorithm that introduces the topological potential field theory in physics and then combines the improved K- 3.1. Association Rule Mining. Association rule mining means algorithm with cluster users in order to help complete (ARM) is an important data mining technique, which is a the later recommendation process. )e researchers proposed process of identifying frequent patterns, associations, or a recommendation algorithm combining user clustering causal structures from various types of datasets. ARM can be with scoring preferences, which preprocesses user behavior used to identify frequent item sets between uncertainties and data by principal component analysis and K-means clus- generate powerful association rules from large datasets, tering and determines the weights of user behavior features especially auxiliary datasets for engineering operations. by multiple linear regression. )e researchers proposed a Currently, many scholars apply clustering methods to the recommendation method based on user-item community discovery of association rules. However, they aim to improve detection. After obtaining clusters with tightly connected the quality of the generated association rules. In this paper, users and items, a traditional collaborative filtering model association rules are applied to the process of user similarity can be trained for each cluster. metrics in clustering to improve the accuracy of user sim- Feature selection, also known as feature subset selection, ilarity metrics in clustering. extracts an optimal subset of features from all features in the An association rule is an inference of the form X> Y, sample data to make the constructed model more gener- where X and Y are nonempty sets that do not contain the alizable and effective. Unlike feature extraction, feature same elements, representing the antecedent and consequent selection can retain as much information on the original parts of the rule, respectively. )ree metrics are generally features as possible while eliminating redundant features. used to measure association rules, namely, support, confi- Feature selection methods are broadly classified into dence, and lift. )e support of an association rule is the Course preparations Data access Course sections 4 Complexity percentage of all transactions that contain both X and Y, clustering, objects in the same cluster tend to be similar in which can also be expressed as the probability P(X> Y); the some sense, while objects in different clusters tend to be confidence is the percentage of transactions that contain different. Cluster analysis allows macroanalysis of data both X and Y, which can also be expressed as the conditional without data mining for a particular individual. Usually, the probability P(Y|X). )e lift is the ratio of the probability of similarity of samples is measured based on calculating the containing Y when X is included to the probability of distance between sample data. containing Y when X is not included. If the lift of an as- )e distance is calculated differently for different sce- sociation rule is greater than 1, it means that X and Y are narios. )e closer the distance between two sample points is, positively correlated; if the lift is less than 1, it means that X the higher the degree of similarity is. Cluster analysis is and Y are negatively correlated; if the lift is equal to 1, it widely used in various fields, such as group classification of means that X and Y are not correlated. target users. )e target audience group is divided into several )e K-mode clustering algorithm uses the plural instead user groups with distinct characteristics of difference, so that of the mean and is based on the idea of simple matching, personalized recommendations and services for the audi- using the Hamming distance to calculate the distance be- ence group can be carried out at a later stage, which ulti- tween two objects. )e dissimilarity between an object and mately improves the efficiency and business effect of the clustering center is the number of features with different enterprise operations, as well as discovering the value values corresponding to the features. Finally, all the 1’s are combinations of different products. summed and the cumulative value represents the dissimi- Enterprises can perform cluster analysis for a large larity between the object and the cluster center, and each number of product categories according to different ap- object belongs to the cluster center with the least dissimi- plication scenarios and purposes and according to specific larity to it. It is defined as follows. evaluation indicators, in order to segment the product Let U � {x , x , x } be a typed dataset containing n objects. system and develop marketing programs that meet the 1 i n )e object x is denoted as [x , x , x ], where m is the current situation of the enterprise and so on. )rough i1 i2 im characteristic number. Let x and x be the two objects clustering analysis, it is possible to identify minority groups i im whose behavioral characteristics differ significantly com- represented by [x , x , x ] and [x , x , x ], respectively. x i1 i2 im i1 i2 im i and x are defined by the following equation for calculating pared to other groups, which may be system anomalies or im irregularities of fraudulent groups and should be dealt with the distance between x and x : i im appropriately and, if necessary, fed back to and monitored 􏽐 Φ x , x 􏼁 i�0 i im by the relevant supervisory authorities. Common clustering Dis x , x 􏼁 � , (1) i im x + · · · + x i im algorithms can be divided into five categories based on their accumulation rules: division-based clustering, hierarchy- where Φ(x) is the indicator function. based clustering, grid-based clustering, density-based clus- tering, and model-based clustering. 1, x < x , i im Φ x , x 􏼁 � 􏼨 (2) i im 0, x < x . im i 3.2. Spectral Clustering. Spectral clustering is a clustering )e optimization model of the K-mode algorithm when method that draws on the idea of graph theory, which the formula is used as a distance metric for the object is converts the problem of classifying data categories into a defined as problem of cutting undirected graphs. )e idea of spectral k n 􏽐 􏽐 x Φ + · · · + Φ 􏼁 clustering algorithm can be simply explained as the original ij i im i�0 j�0 (3) Q(X, Y) � , high-dimensional feature space is downscaled to obtain a low- ij dimensional feature space, and then other traditional clustering algorithms are used in the low-dimensional data for clustering subject to analysis, so as to achieve the purpose of clustering on data k n sample space of different shapes, as shown in Figure 2. 􏽘 u 􏽘 Φ + Φ � 1, i, j ∈ [0, 1] (4) 􏼐 􏼑 ij i j Euclidean distance is the most basic definition of the i�0 i�0 distance between two samples in an n-dimensional data where the affiliation matrix U is an n∗k binary matrix. At space or the modulus of a vector. )e Euclidean distance is each iteration, if object i belongs to cluster p, then let i � 1; pu chosen as the distance measure in most clustering algo- otherwise, i > 0. Z � {z , z , z } denotes the set of k centers. pu 1 2 k rithms. It is defined as follows: w � 􏼈w , w , w 􏼉 is the weight vector of all features in the ����������� 1 2 m 􏽶 dataset. )e purpose of cluster analysis is to classify objects using dci � 􏽘􏼐Φ + Φ 􏼑. (5) i j the nature of the data itself, to be able to calculate the i�0 similarity between sample points according to specific definitions and to discover internal patterns in the data Cosine similarity measures the similarity of two vectors through iterations in order to classify the data. )e above by calculating the cosine of the angle between them. In the process does not require labeling information, so clustering field of text classification, cosine similarity is more widely analysis is unsupervised learning in machine learning. By used. It is defined as follows: Complexity 5 Euclidean distance Data to climb –2 (Ф + Ф ) dci = i j i=0 –4 02468 10 Longitudinal data distribution Figure 2: Spectral clustering classification diagram. 4. Cluster Analysis-Based User Group sim (q) � k � 1|x (k) − x (k) , (6) 􏼈 􏼉 a o i Discovery Applications max minΔ (q) λ (i) � . (7) a 4.1. Data Acquisition. With the continuous evolution of the Δ (q) + δ max Δi(q)/min Δ (q) a a Internet and the popularity of smart devices, people can access the Internet in various ways. While the Internet brings Paul Jaccard introduced the Jaccard index, also convenience to life, it also accumulates a large amount of known as the Jaccard similarity coefficient, and used it to data, and how to mine and analyze these data to bring out analyze the distribution of alpine flora. )e Jaccard index the value of data is a hot issue for research. On the other is used to measure the similarity between two sample sets. hand, in the current situation, users play a very important )e Jaccard distance is a complement to the Jaccard role. Only with the users will the enterprise have revenue and index, and its objective is to calculate the dissimilarity long-term development. How to effectively analyze the between two sample sets. Its generalized formula is de- characteristics of users so as to provide customized services fined as follows: n for different user groups is also a very meaningful research 􏽐 v λ i�1 (8) q � . problem. In this chapter, two-layer structured user clus- tering (TL-FIUC) algorithm is applied to the real user characteristics data, and the user characteristics data are In recent years, many Internet companies are paying more and more attention to how to use data for precision clustered and analyzed to discover the main user groups, which provide a reference for the audience user analysis of marketing services. Internationally, companies such as Walmart and Amazon have a pivotal position in the field of enterprises. )e dataset has been officially desensitized and the user behavior analysis; in China, e-commerce companies such as Taobao, Jingdong, and Jindoduo are conducting feature types are mostly ordered discrete features. )erefore, research on user behavior prediction and product recom- these features can be treated as numerical features when mendation for precision marketing. )rough the clustering calculating user similarity and clustering centers. To verify analysis of users, we can find people with different interests the validity of the algorithm TL-FIUC, user_profile is first and behaviors, so that companies can analyze the charac- preprocessed to remove samples containing missing and teristics of the core user groups of their products and dig out abnormal values, and user features with data amounts of 1000 (Dataset6) and 10000 (Dataset7) are randomly selected hidden customers to help and provide a basis for improving products and precision marketing. However, the existing from the dataset respectively as the validation dataset in this application scenario. Since the original dataset is unlabeled user clustering algorithms cannot effectively explore the information contained in the MDF of user data. )is leads to data, it is necessary to determine the value of the clustering a decrease in the utilization of user data and the accuracy of number k first. In this paper, Sum of Squared Error (SSE) is similarity calculation among users. To address this problem, used to make the judgment. First, we draw the SSE line graph this paper proposes a user clustering algorithm that com- of the clustering results with different values of k and then bines association rules with MDF. Firstly, association rules observe where the inflection point of the image is the best are introduced into the Jaccard distance calculation process value of the clustering number k. In practical applications, to calculate the similarity between users, and this method the number of user groups found is generally not too large. improves the data utilization and the accuracy of the sim- )is is because the purpose of clustering analysis of users by enterprises is to understand several user-product audience ilarity measure. )e update method of clustering centers is improved based on the idea of the K-mode clustering al- groups with different characteristics; if too many central users are found, it will lead to smaller differences between gorithm to accommodate complex data types. Horizontal data distribution 6 Complexity discover the main user groups and the feasibility of the each user group and cannot analyze the differences in characteristics between user groups more intuitively. algorithm in the field of user clustering analysis is verified by visualization. In addition, this application experiment ver- )erefore, in this application for dataset Dataset6, the ex- periments set the value range of k to (1, 9); for dataset ifies that the optimal number of clusters for users generally Dataset7, the experiments set the value range of k to (1, 20) in does not exceed 10, as shown in Figure 5. On the one hand, order to find the value of the clustering number k that has a the number of clusters is influenced by the number of user better effect on the division of user groups. )e SSE fold features: generally, the higher the number of user features, diagram is shown in Figure 3. the higher the optimal number of clusters; on the other hand, For Dataset6, it can be observed that the SSE decreases it is affected by the similarity measure. )e larger the number of clusters, the smaller the difference between user groups. faster when the number of clusters k is in the range of (1, 4) and slows down significantly when k is in the range of (4, 9), )erefore, the number of clusters should not be set too large when companies perform clustering analysis on users in so the value of k is set to 4; for Dataset7, it can be observed from Figure 3 that the SSE decreases faster when k is in the practical applications. range of (1, 6) and slows down significantly when k is in the range of (6, 20). In practice, the number of clusters can be set 5. Results and Discussion according to the actual needs of the enterprise. User behavior analysis is a user-centered analysis of their historical behavioral data or even ongoing behavioral ac- 4.2. Precision Marketing of E-Commerce Platform. In this tions, using techniques such as mathematical statistics or paper, the TL-FIUC algorithm is used to cluster Dataset6 data mining. Among them, clustering analysis technology is and Dataset7 separately. )e results were generated with k more widely used in user behavior analysis by data analysts clusters and k clustering centers. )e clustering effect of the and researchers of various enterprises, the application sce- TL-FIUC algorithm on the two datasets is shown in Figure 4. nario of clustering algorithm has been gradually expanded, User clustering analysis is to classify users with the same or and good results have been achieved. However, there are still similar behavioral characteristics into the same group by some problems in the application of clustering algorithms in means of clustering and then discover the core, larger user the field of user behavior analysis which need to be solved, groups by iterative update of clusters. )is chapter briefly and there are still many shortcomings in the use of clustering introduces the relevant theoretical foundation and prepa- analysis in the field of user behavior analysis. )e purpose of ratory knowledge involved in the research of this paper this paper is to solve the current problems of user clustering through the algorithm definition and steps. )e main topics analysis and try to explore more application scenarios of user include user clustering methods, definitions and methods clustering in the process of solving the problems. related to ARM, and random forest-based feature selection )is paper solves the problem of low accuracy of user methods required when considering the importance of user similarity calculation due to the current low data utili- features. We propose a clustering method that combines zation. )e method introduces association rules into the association rules with MDF based on the idea of the K-mode calculation process of Jaccard distance, constructs a user clustering algorithm. First, this paper constructs a method to similarity measure, and improves the update method of calculate the similarity between users using Jaccard distance clustering centers based on the idea of the K-mode and combines association rules with Jaccard distance to clustering algorithm. It is verified through experiments improve the similarity between users; then, a clustering that the ARMDKM algorithm outperforms the traditional center update rule for MDF is proposed; finally, the simi- clustering algorithm in several evaluation criteria, not larity measurement method combines association rules, only improves the utilization of data but also improves the Jaccard distance, and clustering. Finally, the basic K-mode quality of user clustering, and solves the problem that the algorithm is improved by using a similarity measure com- clustering algorithm cannot effectively analyze the MDF bining the association rule with Jaccard distance and the in user data. clustering center update method, which is the ARMDKM In addition, a TL-FIUC algorithm is proposed experi- algorithm proposed in this paper. )e method solves the mentally to consider the influence of the importance of user problem that the traditional model cannot deal with MDF data features. In the field of user behavior analysis, the effectively and proves its theoretical correctness. )e ex- behavior of different users varies and the importance of periment verifies the correctness of the new method by the different data features for user analysis varies. In order to purity of clustering, entropy value, contour coefficient, and comprehensively consider the weight relationship between other indexes. user data features, this paper first uses K-means++ to analyze )e TL-FIUC algorithm can effectively classify users the data in one clustering to generate pseudolabel features, as with different characteristics, and the effect of this algorithm shown in Figure 6. )en, the OOB error in the random forest on clustering user data is significant. In summary, through algorithm is used to evaluate the feature importance to cluster analysis, companies are able to obtain several major obtain the weight parameters of the features. Finally, based user groups of large scale. Analyzing the characteristics of on the idea of spectral clustering, the weighted user data are these groups can provide a more intuitive understanding of analyzed by clustering to obtain the final clustering results. users and uncover potential user groups. )e TL-FIUC )e experimental results show that the algorithm effectively algorithm is applied in the cluster analysis of real data sets to improves the clustering accuracy. )e TL-FIUC algorithm Complexity 7 20 50 Group 1 Group 2 15 45 10 40 5 35 0 30 –5 25 –10 20 –15 15 –20 10 Number of clusters Number of clusters 68.7 65.6 65.6 64.6 64.3 60.9 54.5 52.8 Data fluctuation Figure 3: )e SSE fold diagram. preprocessing approach, as shown in Figure 7. Particularly when the data are not labeled or when a simple under- standing of the data distribution pattern is needed, cluster analysis can effectively accomplish the above tasks. )ere- fore, it is worthwhile to explore the application scenarios of clustering algorithms under different domains. )e dataset used in the experiment contains only four user features selected from the user feature dataset to participate in the similarity calculation, the main purpose is to verify the feasibility and effectiveness of the algorithm in this paper, and more features can be selected for analysis in practical applications. In the clustering analysis of the LR dataset, the performance of all seven algorithms improved in the NMI index, and the TL-FIUC algorithm performed the best. On Colony the other hand, the TL-FIUC algorithm proposed in this 15 30 44 59 73 88 Intensity paper is based on spectral clustering, which can better handle sparse matrices due to its natural dimensionality Figure 4: Clustering diagrams on the two datasets. reduction ability, thus improving the clustering effect. )is part of the experiment shows that the TL-FIUC algorithm is a feasible method to be applied in the field of image also performs well in the clustering analysis of real user information datasets. processing. Subsequent improvements such as parallel computing or Due to the nature of clustering itself, it is able to divide data according to the characteristics of the data itself, ag- approximation algorithms can be considered to improve the gregating similar clusters and separating dissimilar ones. efficiency of the algorithm. In addition to the above-men- Currently, cluster analysis is widely used in business deci- tioned algorithm improvement directions, this paper can sion-making, precision marketing, and other fields. Cluster also consider the combination of fuzzy clustering of the analysis can discover hidden information among users and clustering analysis of users. )e basic clustering algorithms can be applied to build more detailed user profiles and utilized in this paper are all hard clustering, and hard potentially discover hidden target user groups. In many data clustering does not reflect the characteristics of variable and mining tasks, cluster analysis can be used as a data flexible user behavior well. )e use of soft clusters may be Score Score Score 8 Complexity 350 250.00% 80 1000.00% 300 800.00% 200.00% 60 250 600.00% 150.00% 40 200 400.00% 150 200.00% 100.00% 20 100 0.00% 50.00% 0 50 –200.00% 0 0.00% –20 –400.00% P- value level Age level Change (%) Change (%) 80 1000.00% 350 250.00% 800.00% 60 200.00% 600.00% 40 150.00% 400.00% 200.00% 20 100.00% 0.00% 100 0 50.00% –200.00% –20 –400.00% 0 0.00% Shopping level Occupation Change (%) Change (%) Figure 5: Six central users obtained from TL-FIUC. 5.41 3.97 3.7 3.65 3.15 2.93 2.91 2.78 2.37 2 1.75 1.75 1.6 1.56 1.55 1.54 1.52 1.21 1.19 1.19 1.1 1.09 0.96 1.07 0.83 0.86 0.86 0.8 0.8 0.75 0.7 0.62 0.46 0.43 0.29 0.23 0.21 2012 2013 2014 2015 2016 2017 2018 2019 SELI Moremore Ang ZIN Rich Figure 6: Performance of FMI indicators for clustering results under numerical dataset. able to improve the quality of the clusters, which is to be same cluster tend to be similar in some sense, while objects verified by subsequent theories and experiments. )e pur- in different clusters tend to be different. Cluster analysis pose of cluster analysis is to classify objects using the nature allows macroanalysis of data without data mining for a of the data itself, to be able to calculate the similarity between particular individual. Usually, the similarity of samples is sample points according to specific definitions, and to measured based on calculating the distance between sample discover internal patterns in the data through iterations in data. order to classify the data. )e above process does not require )e distance is calculated differently for different sce- labeling information, so clustering analysis is unsupervised narios. Cluster analysis is widely used in various fields, such learning in machine learning. By clustering, objects in the as group classification of target users. )e target audience Shopping Level Occupation P-value WEAC SEC Complexity 9 8.5 8.0 7.5 K-means 7.0 6.5 TL-BIUC 6.0 5.5 LWMC 5.0 4.5 4.0 0 3 8 0 Setosa Versicolor Virginica Figure 7: NMI metric performance comparison of TL-FIUC algorithm on image dataset. effective in dealing with high-dimensional data, but because group is divided into several user groups with distinct characteristics of difference, so that personalized recom- spectral clustering needs to calculate the similarity between mendations and services for the audience group can be each sample, resulting in excessive time overhead in dealing carried out at a later stage, which ultimately improves the with large sample data. efficiency and business effect of enterprise operations, as well as discovering the value combinations of different products. Data Availability )e data used to support the findings of this study are 6. Conclusion available from the corresponding author upon request. )e posterior pieces of association rules mined in this ex- periment contain only one element. Subsequent attempts Conflicts of Interest can be made to combine association rules containing multiple posterior elements with similarity metrics to im- )e authors declare that they have no known competing prove the accuracy of user similarity calculation in practical financial interests or personal relationships that could have applications. )e ARMDKM algorithm is based on the basic appeared to influence the work reported in this paper. K-mode algorithm, which is not improved for the initiali- zation problem and requires multiple runs to obtain better References results. In addition, the data features of this experiment are selected by hand, and the ARMDKM algorithm has no [1] N. Huang, “Analysis and design of university teaching ability to select features. Subsequent methods can be evaluation system based on jsp platform,” International combined with existing methods to solve the initialization Journal of Education and Management Engineering, vol. 7, and feature selection problems to achieve better results. )e no. 3, pp. 43–50, 2017. [2] Q. Wang, C. Wu, and Y. Sun, “Evaluating corporate social algorithm itself contains two major steps: one is unsuper- responsibility of airlines using entropy weight and grey re- vised feature selection and the other is overall clustering lation analysis,” Journal of Air Transport Management, vol. 42, analysis; the unsupervised feature selection method uses one pp. 55–62, 2015. clustering analysis and one classification model construc- [3] T. S. Riall, J. Teiman, M Chang et al., “Maintaining the fire but tion, the algorithms used are relatively primitive, and there is avoiding burnout: implementation and evaluation of a resi- still a lot of room for optimization. )e algorithm can be dent well-being program,” Journal of the American College of improved by referring to the latest papers on the direction of Surgeons, vol. 226, no. 4, pp. 369–379, 2017. unsupervised feature selection; the overall clustering analysis [4] T. Singh, A. Patnaik, and R. Chauhan, “Optimization of part uses the spectral clustering algorithm, which is more tribological properties of cement kiln dust-filled brake pad KCC 10 Complexity using grey relation analysis,” Materials & Design, vol. 89, pp. 1335–1342, 2016. [5] R. Tan, W. Zhang, and C. Shengqun, “Decision-making method based on grey relation analysis and trapezoidal fuzzy neutrosophic numbers under double incomplete information and its application in typhoon disaster assessment,” IEEE Access, vol. 8, pp. 3606–3628, 2019. [6] T. Wang, “Study on adhesion property of asphalt and ag- gregate based on grey relation,” 5eory, vol. 48, no. 14, pp. 40–42, 2019. [7] J. W. Boland, M. Brown, A. Duenas, G. M. Finn, and J. Gibbins, “How effective is undergraduate palliative care teaching for medical students?,” A Systematic Literature Re- view, vol. 10, no. 9, pp. 036458-036459, 2020. [8] J. Teaching and L. Practice, “A practice-based study of Chi- nese students learning—putting things together,” Journal of University Teaching and Learning Practice (JUTLP), vol. 16, no. 2, pp. 12–18, 2019. [9] X. Wang, “Application of grey relation analysis theory to choose high reliability of the network node,” Journal of Physics Conference Series, vol. 1237, no. 3, pp. 032055-032056, 2019. [10] M. Dinerstein, L. Einav, J. Levin, and N. Sundaresan, “Consumer price search and platform design in internet commerce,” American Economic Review, vol. 108, no. 7, pp. 1820–1859, 2018. [11] L. Chen, S. Qiao, N. Han et al., “Friendship prediction model based on factor graphs integrating geographical location,” CAAI Transactions on Intelligence Technology, vol. 5, no. 3, pp. 193–199, 2020. [12] Q. Wang, Y. Yu, H. Gao et al., “Network representation learning enhanced recommendation algorithm,” IEEE Access, vol. 7, pp. 61388–61399, 2019. [13] Y. Cen, J. Zhang, G. Wang et al., “Trust relationship prediction in alibaba E-commerce platform,” IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 5, pp. 1024– 1035, 2020. [14] E. Cristobal-Fransi, Y. Montegut-Salla, B. Ferrer-Rosell, and N. Daries, “Rural cooperatives in the digital age: an analysis of the internet presence and degree of maturity of agri-food cooperatives’ E-commerce,” Journal of Rural Studies, vol. 74, pp. 55–66, 2020. [15] Z. H. Borbora, M. A. Ahmad, J. Oh, K. Z. Haigh, J. Srivastava, and Z. Wen, “Robust features of trust in social networks,” Social Network Analysis and Mining, vol. 3, no. 4, pp. 981–999, [16] P. M. Carron, K. Kaski, and R. Dunbar, “Calling Dunbar’s numbers,” Social Networks, vol. 47, pp. 151–155, 2016. [17] J. Kwak, Y. Zhang, and J. Yu, “Legitimacy building and e-commerce platform development in China: the experience of Alibaba,” Technological Forecasting and Social Change, vol. 139, pp. 115–124, 2019. [18] Z. Almeraj, F. Boujarwah, D. Alhuwail, and R. Qadri, “Evaluating the accessibility of higher education institution websites in the state of Kuwait: empirical evidence,” Universal Access in the Information Society, vol. 1, pp. 11–18, 2020. [19] R. Gonçalves, T. Rocha, J. Martins, F. Branco, and M. Au- Yong-Oliveira, “Evaluation of E-commerce websites acces- sibility and usability: an E-commerce platform analysis with the inclusion of blind users,” Universal Access in the Infor- mation Society, vol. 17, no. 3, pp. 567–583, 2018. [20] A. Ismail, K. S. Kuppusamy, and S. Paiva, “Accessibility analysis of higher education institution websites of Portugal,” Universal Access in the Information Society, vol. 19, no. 3, pp. 685–700, 2020.

Journal

ComplexityHindawi Publishing Corporation

Published: Mar 5, 2021

There are no references for this article.