Friday, June 5, 2015

Spark MLlib Review

I wrote up a little review of Spark MLlib - it can be found here (PDF).
Iterative methods are at the core of Spark MLlib. Given a problem, we guess an answer, then iteratively improve the guess until some condition is met (e.g. Krylov subspace methods). Improving an answer typically involves passing through all of the distributed data and aggregating some partial result on the driver node. This partial result is some model, for instance, an array of numbers. Condition can be some sort of convergence of the sequence of guesses or reaching the maximum number of allowed iterations.

