Get 20M+ Full-Text Papers For Less Than $1.50/day. Subscribe now for You or Your Team.

Learn More →

Cautionary Remarks on the Use of Clusterwise Regression

Cautionary Remarks on the Use of Clusterwise Regression Clusterwise linear regression is a multivariate statistical procedure that attempts to cluster objects with the objective of minimizing the sum of the error sums of squares for the within-cluster regression models. In this article, we show that the minimization of this criterion makes no effort to distinguish the error explained by the within-cluster regression models from the error explained by the clustering process. In some cases, most of the variation in the response variable is explained by clustering the objects, with little additional benefit provided by the within-cluster regression models. Accordingly, there is tremendous potential for overfitting with clusterwise regression, which is demonstrated with numerical examples and simulation experiments. To guard against the misuse of clusterwise regression, we recommend a benchmarking procedure that compares the results for the observed empirical data with those obtained across a set of random permutations of the response measures. We also demonstrate the potential for overfitting via an empirical application related to the prediction of reflective judgment using high school and college performance measures. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Multivariate Behavioral Research Taylor & Francis

Cautionary Remarks on the Use of Clusterwise Regression

21 pages

Loading next page...
 
/lp/taylor-francis/cautionary-remarks-on-the-use-of-clusterwise-regression-j42SUdiETs

References (47)

Publisher
Taylor & Francis
Copyright
Copyright Taylor & Francis Group, LLC
ISSN
1532-7906
eISSN
0027-3171
DOI
10.1080/00273170701836653
pmid
26788971
Publisher site
See Article on Publisher Site

Abstract

Clusterwise linear regression is a multivariate statistical procedure that attempts to cluster objects with the objective of minimizing the sum of the error sums of squares for the within-cluster regression models. In this article, we show that the minimization of this criterion makes no effort to distinguish the error explained by the within-cluster regression models from the error explained by the clustering process. In some cases, most of the variation in the response variable is explained by clustering the objects, with little additional benefit provided by the within-cluster regression models. Accordingly, there is tremendous potential for overfitting with clusterwise regression, which is demonstrated with numerical examples and simulation experiments. To guard against the misuse of clusterwise regression, we recommend a benchmarking procedure that compares the results for the observed empirical data with those obtained across a set of random permutations of the response measures. We also demonstrate the potential for overfitting via an empirical application related to the prediction of reflective judgment using high school and college performance measures.

Journal

Multivariate Behavioral ResearchTaylor & Francis

Published: Mar 28, 2008

There are no references for this article.