Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Profiling Local Optima in K-Means Clustering: Developing a Diagnostic Technique

Profiling Local Optima in K-Means Clustering: Developing a Diagnostic Technique Using the cluster generation procedure proposed by D. Steinley and R. Henson (2005), the author investigated the performance of K-means clustering under the following scenarios: (a) different probabilities of cluster overlap; (b) different types of cluster overlap; (c) varying samples sizes, clusters, and dimensions; (d) different multivariate distributions of clusters; and (e) various multidimensional data structures. The results are evaluated in terms of the Hubert-Arabie adjusted Rand index, and several observations concerning the performance of K-means clustering are made. Finally, the article concludes with the proposal of a diagnostic technique indicating when the partitioning given by a K-means cluster analysis can be trusted. By combining the information from several observable characteristics of the data (number of clusters, number of variables, sample size, etc.) with the prevalence of unique local optima in several thousand implementations of the K-means algorithm, the author provides a method capable of guiding key data-analysis decisions. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Psychological Methods American Psychological Association

Profiling Local Optima in K-Means Clustering: Developing a Diagnostic Technique

Psychological Methods , Volume 11 (2): 15 – Jun 1, 2006

Loading next page...
 
/lp/american-psychological-association/profiling-local-optima-in-k-means-clustering-developing-a-diagnostic-MIHyZwS9z0

References (54)

Publisher
American Psychological Association
Copyright
Copyright © 2006 American Psychological Association
ISSN
1082-989x
eISSN
1939-1463
DOI
10.1037/1082-989X.11.2.178
pmid
16784337
Publisher site
See Article on Publisher Site

Abstract

Using the cluster generation procedure proposed by D. Steinley and R. Henson (2005), the author investigated the performance of K-means clustering under the following scenarios: (a) different probabilities of cluster overlap; (b) different types of cluster overlap; (c) varying samples sizes, clusters, and dimensions; (d) different multivariate distributions of clusters; and (e) various multidimensional data structures. The results are evaluated in terms of the Hubert-Arabie adjusted Rand index, and several observations concerning the performance of K-means clustering are made. Finally, the article concludes with the proposal of a diagnostic technique indicating when the partitioning given by a K-means cluster analysis can be trusted. By combining the information from several observable characteristics of the data (number of clusters, number of variables, sample size, etc.) with the prevalence of unique local optima in several thousand implementations of the K-means algorithm, the author provides a method capable of guiding key data-analysis decisions.

Journal

Psychological MethodsAmerican Psychological Association

Published: Jun 1, 2006

There are no references for this article.