Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Simulating Data to Study Performance of Finite Mixture Modeling and Clustering Algorithms

Simulating Data to Study Performance of Finite Mixture Modeling and Clustering Algorithms A new method is proposed to generate sample Gaussian mixture distributions according to prespecified overlap characteristics. Such methodology is useful in the context of evaluating performance of clustering algorithms. Our suggested approach involves derivation of and calculation of the exact overlap between every cluster pair, measured in terms of their total probability of misclassification, and then guided simulation of Gaussian components satisfying prespecified overlap characteristics. The algorithm is illustrated in two and five dimensions using contour plots and parallel distribution plots, respectively, which we introduce and develop to display mixture distributions in higher dimensions. We also study properties of the algorithm and variability in the simulated mixtures. The utility of the suggested algorithm is demonstrated via a study of initialization strategies in Gaussian clustering. This article has supplementary material online. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Computational and Graphical Statistics Taylor & Francis

Simulating Data to Study Performance of Finite Mixture Modeling and Clustering Algorithms

23 pages

Loading next page...
 
/lp/taylor-francis/simulating-data-to-study-performance-of-finite-mixture-modeling-and-0vWi3Zrf8t

References (46)

Publisher
Taylor & Francis
Copyright
© 2010 American Statistical Association, Institute of Mathematical Statistics, and Interface Foundation of North America
ISSN
1537-2715
eISSN
1061-8600
DOI
10.1198/jcgs.2009.08054
Publisher site
See Article on Publisher Site

Abstract

A new method is proposed to generate sample Gaussian mixture distributions according to prespecified overlap characteristics. Such methodology is useful in the context of evaluating performance of clustering algorithms. Our suggested approach involves derivation of and calculation of the exact overlap between every cluster pair, measured in terms of their total probability of misclassification, and then guided simulation of Gaussian components satisfying prespecified overlap characteristics. The algorithm is illustrated in two and five dimensions using contour plots and parallel distribution plots, respectively, which we introduce and develop to display mixture distributions in higher dimensions. We also study properties of the algorithm and variability in the simulated mixtures. The utility of the suggested algorithm is demonstrated via a study of initialization strategies in Gaussian clustering. This article has supplementary material online.

Journal

Journal of Computational and Graphical StatisticsTaylor & Francis

Published: Jan 1, 2010

Keywords: Cluster overlap; Eccentricity of ellipsoid; Mclust; MixSim; Mixture distribution; Parallel distribution plots

There are no references for this article.