An occasionally updated research blog by Alex Williams. Back to blog posts.

A Short Introduction to Optimal Transport and Wasserstein Distance

Some introductory notes on a recently popular topic in machine learning research.

How to cross-validate PCA, clustering, and matrix decomposition models

Cross-validation is a somewhat tricky problem for PCA, clustering, and other matrix factorization models. This post provides some Python code snippets for fitting these models with held out data.

Solving Least-Squares Regression with Missing Data

We show how to fit least squares regression with data missing at random.

Everything you did and didn't know about PCA

This post provides a short introduction to principal components analysis (PCA).

Clustering is hard, except when it's not

We review some more optimistic results characterizing when clustering is not so hard to accomplish.

Is clustering mathematically impossible?

A review of a result proved by Kleinberg (2002).

Why is clustering hard?

A brief look into why clustering is a hard problem.