Access the full text.
Sign up today, get DeepDyve free for 14 days.
Finding and analyzing Simpson's paradox, a well known statistical phenomenon, has found many applications. While the existing literature focuses on only analyzing the causes of identi ed Simpson's paradox, there is no systematic analysis on Simpson's paradox in multidimensional spaces. In this paper, we develop a simple yet practical approach to automatically identify all Simpson's paradox instances formed by various sub-populations and separator attributes in a multidimensional data set. Moreover, we analyze the distribution of the multidimensional Simpson's paradox instances on three real data sets with respect to dimensionality, size of sub-populations, participation of individual records, redundancy, and more. We obtain a series of interesting observations about a few questions that have never been asked before. The results open doors to a few interesting directions for future study. Moreover, this paper is an outcome from a high-school student summer research internship. It re ects our on-going e ort in promoting data science research to youth and high school students.
ACM SIGKDD Explorations Newsletter – Association for Computing Machinery
Published: Dec 5, 2022
Read and print from thousands of top scholarly journals.
Already have an account? Log in
Bookmark this article. You can see your Bookmarks on your DeepDyve Library.
To save an article, log in first, or sign up for a DeepDyve account if you don’t already have one.
Copy and paste the desired citation format or use the link below to download a file formatted for EndNote
Access the full text.
Sign up today, get DeepDyve free for 14 days.
All DeepDyve websites use cookies to improve your online experience. They were placed on your computer when you launched this website. You can change your cookie settings through your browser.