Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Model-Based Clustering for Conditionally Correlated Categorical Data

Model-Based Clustering for Conditionally Correlated Categorical Data An extension of the latent class model is presented for clustering categorical data by relaxing the classical “class conditional independence assumption” of variables. This model consists in grouping the variables into inter-independent and intra-dependent blocks, in order to consider the main intra-class correlations. The dependency between variables grouped inside the same block of a class is taken into account by mixing two extreme distributions, which are respectively the independence and the maximum dependency. When the variables are dependent given the class, this approach is expected to reduce the biases of the latent class model. Indeed, it produces a meaningful dependency model with only a few additional parameters. The parameters are estimated, by maximum likelihood, by means of an EM algorithm. Moreover, a Gibbs sampler is used for model selection in order to overcome the computational intractability of the combinatorial problems involved by the block structure search. Two applications on medical and biological data sets show the relevance of this new model. The results strengthen the view that this model is meaningful and that it reduces the biases induced by the conditional independence assumption of the latent class model. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Classification Springer Journals

Model-Based Clustering for Conditionally Correlated Categorical Data

Loading next page...
 
/lp/springer-journals/model-based-clustering-for-conditionally-correlated-categorical-data-VfnM75Hvl7

References (43)

Publisher
Springer Journals
Copyright
Copyright © 2015 by Classification Society of North America
Subject
Statistics; Statistical Theory and Methods; Pattern Recognition; Bioinformatics; Signal, Image and Speech Processing; Psychometrics; Marketing
ISSN
0176-4268
eISSN
1432-1343
DOI
10.1007/s00357-015-9180-4
Publisher site
See Article on Publisher Site

Abstract

An extension of the latent class model is presented for clustering categorical data by relaxing the classical “class conditional independence assumption” of variables. This model consists in grouping the variables into inter-independent and intra-dependent blocks, in order to consider the main intra-class correlations. The dependency between variables grouped inside the same block of a class is taken into account by mixing two extreme distributions, which are respectively the independence and the maximum dependency. When the variables are dependent given the class, this approach is expected to reduce the biases of the latent class model. Indeed, it produces a meaningful dependency model with only a few additional parameters. The parameters are estimated, by maximum likelihood, by means of an EM algorithm. Moreover, a Gibbs sampler is used for model selection in order to overcome the computational intractability of the combinatorial problems involved by the block structure search. Two applications on medical and biological data sets show the relevance of this new model. The results strengthen the view that this model is meaningful and that it reduces the biases induced by the conditional independence assumption of the latent class model.

Journal

Journal of ClassificationSpringer Journals

Published: Jul 9, 2015

There are no references for this article.