Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Outlier Identification in Model-Based Cluster Analysis

Outlier Identification in Model-Based Cluster Analysis In model-based clustering based on normal-mixture models, a few outlying observations can influence the cluster structure and number. This paper develops a method to identify these, however it does not attempt to identify clusters amidst a large field of noisy observations. We identify outliers as those observations in a cluster with minimal membership proportion or for which the cluster-specific variance with and without the observation is very different. Results from a simulation study demonstrate the ability of our method to detect true outliers without falsely identifying many non-outliers and improved performance over other approaches, under most scenarios. We use the contributed R package MCLUST for model-based clustering, but propose a modified prior for the cluster-specific variance which avoids degeneracies in estimation procedures. We also compare results from our outlier method to published results on National Hockey League data. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Journal of Classification Springer Journals

Outlier Identification in Model-Based Cluster Analysis

Loading next page...
 
/lp/springer-journals/outlier-identification-in-model-based-cluster-analysis-BwCu5jjTcu

References (22)

Publisher
Springer Journals
Copyright
Copyright © 2015 by Classification Society of North America
Subject
Statistics; Statistical Theory and Methods; Pattern Recognition; Bioinformatics; Signal, Image and Speech Processing; Psychometrics; Marketing
ISSN
0176-4268
eISSN
1432-1343
DOI
10.1007/s00357-015-9171-5
Publisher site
See Article on Publisher Site

Abstract

In model-based clustering based on normal-mixture models, a few outlying observations can influence the cluster structure and number. This paper develops a method to identify these, however it does not attempt to identify clusters amidst a large field of noisy observations. We identify outliers as those observations in a cluster with minimal membership proportion or for which the cluster-specific variance with and without the observation is very different. Results from a simulation study demonstrate the ability of our method to detect true outliers without falsely identifying many non-outliers and improved performance over other approaches, under most scenarios. We use the contributed R package MCLUST for model-based clustering, but propose a modified prior for the cluster-specific variance which avoids degeneracies in estimation procedures. We also compare results from our outlier method to published results on National Hockey League data.

Journal

Journal of ClassificationSpringer Journals

Published: Mar 11, 2015

There are no references for this article.