Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Clustering Binary Data in the Presence of Masking Variables

Clustering Binary Data in the Presence of Masking Variables A number of important applications require the clustering of binary data sets. Traditional nonhierarchical cluster analysis techniques, such as the popular K-means algorithm, can often be successfully applied to these data sets. However, the presence of masking variables in a data set can impede the ability of the K-means algorithm to recover the true cluster structure. The author presents a heuristic procedure that selects an appropriate subset from among the set of all candidate clustering variables. Specifically, this procedure attempts to select only those variables that contribute to the definition of true cluster structure while eliminating variables that can hide (or mask) that true structure. Experimental testing of the proposed variable-selection procedure reveals that it is extremely successful at accomplishing this goal. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Psychological Methods American Psychological Association

Clustering Binary Data in the Presence of Masking Variables

Psychological Methods , Volume 9 (4): 14 – Dec 1, 2004

Loading next page...
 
/lp/american-psychological-association/clustering-binary-data-in-the-presence-of-masking-variables-FJeb7wnxwl

References

References for this paper are not available at this time. We will be adding them shortly, thank you for your patience.

Publisher
American Psychological Association
Copyright
Copyright © 2004 American Psychological Association
ISSN
1082-989x
eISSN
1939-1463
DOI
10.1037/1082-989X.9.4.510
pmid
15598102
Publisher site
See Article on Publisher Site

Abstract

A number of important applications require the clustering of binary data sets. Traditional nonhierarchical cluster analysis techniques, such as the popular K-means algorithm, can often be successfully applied to these data sets. However, the presence of masking variables in a data set can impede the ability of the K-means algorithm to recover the true cluster structure. The author presents a heuristic procedure that selects an appropriate subset from among the set of all candidate clustering variables. Specifically, this procedure attempts to select only those variables that contribute to the definition of true cluster structure while eliminating variables that can hide (or mask) that true structure. Experimental testing of the proposed variable-selection procedure reveals that it is extremely successful at accomplishing this goal.

Journal

Psychological MethodsAmerican Psychological Association

Published: Dec 1, 2004

There are no references for this article.