CPSC/AMTH/CBB 745 - Advanced Topics in Machine Learning & Data Mining - Spring 2018 Yale
Yale University CPSC745 - S2018

CPSC/AMTH/CBB 745 - Advanced Topics in Machine Learning & Data Mining - Spring 2018

Unsupervised Data Visualization for Big Data Exploratory Analysis

Kevin Moon

Feb 28th, 2018


We live in an era of big data in which researchers in nearly every field are generating thousands or even millions of samples in high dimensions. Most methods in data science focus on prediction or impose restrictive assumptions that require established knowledge and understanding of the data; i.e. these methods require some level of expert supervision. However, in many cases, this knowledge is unavailable and the goal of data analysis is scientific discovery and to develop a better understanding of the data. There is especially a strong need for methods that perform unsupervised data visualization, which is crucial for developing intuition and understanding of the data. In this talk, I present PHATE: an unsupervised data visualization tool based on a new information distance that excels at denoising the data while preserving both global and local structure. In addition, I demonstrate PHATE on a variety of datasets including facial images, mass cytometry data, and new single-cell RNA-sequencing data. On the latter, I show how PHATE can be used to discover novel surface markers for sorting cell populations.