Promoting collaboration across the theoretical sciences

Introduction to Data Science and Machine Learning

Geometry and topology in data science and machine learning
Mikael Vejdemo-Johansson, College of Staten Island CUNY

Taking a geometric perspective on data analysis tasks and techniques has proven to be a fruitful approach. In this talk, we will encounter four different research fields that emerge from studying either data sets or data analysis techniques with specific geometric or topological toolkits:

1. Topological Data Analysis - the algebraic topological invariants associated to data sets can inform us of the structure of the data sets themselves.
2. Geometric Data Analysis - using manifold learning and studying possible coverings or atlases of data sets gives us a more fine-grained, geometric description toolkit
3. Information Geometry - the techniques in statistical analysis and in information theory sometimes end up very similar to constructions in differential geometry and in algebraic topology. The analogies can be leveraged to further understanding of the original statistical methods themselves.
4. Algebraic Statistics - very many statistical techniques are in one way or another described by or related to a system of polynomial functions or equations. Leveraging algebraic geometry, these can be better understood - starting with magic squares for study designs and progressing through the ages.