Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

On the Added Value of Bootstrap Analysis for K-Means Clustering

On the Added Value of Bootstrap Analysis for K-Means Clustering Because of its deterministic nature, K-means does not yield confidence information about centroids and estimated cluster memberships, although this could be useful for inferential purposes. In this paper we propose to arrive at such information by means of a non-parametric bootstrap procedure, the performance of which is tested in an extensive simulation study. Results show that the coverage of hyper-ellipsoid bootstrap confidence regions for the centroids is in general close to the nominal coverage probability. For the cluster memberships, we found that probabilistic membership information derived from the bootstrap analysis can be used to improve the cluster assignment of individual objects, albeit only in the case of a very large number of clusters. However, in the case of smaller numbers of clusters, the probabilistic membership information still appeared to be useful as it indicates for which objects the cluster assignment resulting from the analysis of the original data is likely to be correct; hence, this information can be used to construct a partial clustering in which the latter objects only are assigned to clusters. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Classification Springer Journals

On the Added Value of Bootstrap Analysis for K-Means Clustering

Loading next page...
 
/lp/springer-journals/on-the-added-value-of-bootstrap-analysis-for-k-means-clustering-1AD226e0Jg

References (34)

Publisher
Springer Journals
Copyright
Copyright © 2015 by Classification Society of North America
Subject
Statistics; Statistical Theory and Methods; Pattern Recognition; Bioinformatics; Signal, Image and Speech Processing; Psychometrics; Marketing
ISSN
0176-4268
eISSN
1432-1343
DOI
10.1007/s00357-015-9178-y
Publisher site
See Article on Publisher Site

Abstract

Because of its deterministic nature, K-means does not yield confidence information about centroids and estimated cluster memberships, although this could be useful for inferential purposes. In this paper we propose to arrive at such information by means of a non-parametric bootstrap procedure, the performance of which is tested in an extensive simulation study. Results show that the coverage of hyper-ellipsoid bootstrap confidence regions for the centroids is in general close to the nominal coverage probability. For the cluster memberships, we found that probabilistic membership information derived from the bootstrap analysis can be used to improve the cluster assignment of individual objects, albeit only in the case of a very large number of clusters. However, in the case of smaller numbers of clusters, the probabilistic membership information still appeared to be useful as it indicates for which objects the cluster assignment resulting from the analysis of the original data is likely to be correct; hence, this information can be used to construct a partial clustering in which the latter objects only are assigned to clusters.

Journal

Journal of ClassificationSpringer Journals

Published: Jul 8, 2015

There are no references for this article.