It’s All a Matter of Perspective
When we analyze data in high dimensions, it is standard practice to project the data to lower dimensions so that patterns are more easily recognized. In this piece, we present the 3D scatterplot of the Iris dataset (Anderson, 1936; Fisher, 1936), one of the most widely used datasets for classification, to illustrate the challenges of projecting high dimensional data unto lower dimensional spaces as patterns might be apparent only on certain projections. This dataset contains three classes which represent three species of Iris flowers: Iris Setosa, Iris Versicolour, and Iris Virginica. The three axes represent sepal length (x axis), sepal width (y axis) and petal length (z axis). The clusters are clearly defined when we view the scatterplot on a certain perspective (which coincides with the projection to the principal components plane). If we change perspective, however, the clusters are not as separated, clouding our interpretations.
About the artists
Zhaoxing Wu grew up in a beautiful city called Yangzhou in China. She is a senior undergraduate student majoring in Mathematics and Statistics, and graduated this May. She is interested in data visualization, especially for high dimensional data and networks.
Hailey Louw has always lived in the Midwest and is loving her time in Madison. She is currently a Masters student in the Department of Statistics and is interested in applying statistical methods and analysis to real-world, biological data.
Shichen Qiao is a recent graduated undergraduate student, who majored in Computer Engineering and Computer Science. His research interests include computer architecture, GPU modeling, and digital control systems, as well as efficient genome filtering and data visualization through bioinformatics web app.
Claudia Solis-Lemus was born and raised in Mexico City. She did her PhD in Statistics at the University of Wisconsin-Madison, and is currently an assistant professor in the Department of Plant Pathology and the Wisconsin Institute for Discovery at the University of Wisconsin-Madison. She is passionate about data visualization as well as statistical analysis of biological big data.