A Short Introduction to Optimal Transport and Wasserstein Distance

09 Oct 2020 Some introductory notes on a recently popular topic in machine learning research.

How to cross-validate PCA, clustering, and matrix decomposition models

26 Feb 2018 Cross-validation is a somewhat tricky problem for PCA, clustering, and other matrix factorization models. This post provides some Python code snippets for fitting these models with held out data.

Solving Least-Squares Regression with Missing Data

26 Feb 2018 We show how to fit least squares regression with data missing at random.

Everything you did and didn't know about PCA

27 Mar 2016 This post provides a short introduction to principal components analysis (PCA).

Clustering is hard, except when it's not

18 Nov 2015 We review some more optimistic results characterizing when clustering is not so hard to accomplish.

Is clustering mathematically impossible?

01 Oct 2015 A review of a result proved by Kleinberg (2002).

Why is clustering hard?

11 Sep 2015 A brief look into why clustering is a hard problem.