Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Note: t for Two (Clusters)

Note: t for Two (Clusters) The computation for cluster analysis is done by iterative algorithms. But here, a straightforward, non-iterative procedure is presented for clustering in the special case of one variable and two groups. The method is univariate but may reasonably be applied to multivariate datasets when the first principal component or a single factor explains much of the variation in the data. The t method is motivated by the fact that minimizing the within-groups sum of squares is equivalent to maximizing the between-groups sum of squares, and that Student’s t statistic measures the between-groups difference in means relative to within-groups variation. That is, the t statistic is the ratio of the difference in sample means, divided by the standard error of this difference. So, maximizing the t statistic is developed as a method for clustering univariate data into two clusters. In this situation, the t method gives the same results as the K-means algorithm. K-means tacitly assumes equality of variances; here, however, with t, equality of variances need not be assumed because separate variances may be used in computing t. The t method is applied to some datasets; the results are compared with those obtained by fitting mixtures of distributions. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Classification Springer Journals

Note: t for Two (Clusters)

Journal of Classification , Volume 36 (3) – Jul 11, 2019

Loading next page...
 
/lp/springer-journals/note-t-for-two-clusters-EPqBU48w9S

References (5)

Publisher
Springer Journals
Copyright
Copyright © 2019 by The Classification Society
Subject
Statistics; Statistical Theory and Methods; Pattern Recognition; Bioinformatics; Signal,Image and Speech Processing; Psychometrics; Marketing
ISSN
0176-4268
eISSN
1432-1343
DOI
10.1007/s00357-019-09335-3
Publisher site
See Article on Publisher Site

Abstract

The computation for cluster analysis is done by iterative algorithms. But here, a straightforward, non-iterative procedure is presented for clustering in the special case of one variable and two groups. The method is univariate but may reasonably be applied to multivariate datasets when the first principal component or a single factor explains much of the variation in the data. The t method is motivated by the fact that minimizing the within-groups sum of squares is equivalent to maximizing the between-groups sum of squares, and that Student’s t statistic measures the between-groups difference in means relative to within-groups variation. That is, the t statistic is the ratio of the difference in sample means, divided by the standard error of this difference. So, maximizing the t statistic is developed as a method for clustering univariate data into two clusters. In this situation, the t method gives the same results as the K-means algorithm. K-means tacitly assumes equality of variances; here, however, with t, equality of variances need not be assumed because separate variances may be used in computing t. The t method is applied to some datasets; the results are compared with those obtained by fitting mixtures of distributions.

Journal

Journal of ClassificationSpringer Journals

Published: Jul 11, 2019

There are no references for this article.