I wrote up a little review of Spark MLlib - it can be found here (PDF).
Iterative methods are at the core of Spark MLlib. Given a problem, we guess an
answer, then iteratively improve the guess until some condition is met (e.g. Krylov
subspace methods).
Improving an answer typically involves passing through all of the distributed data
and aggregating some partial result on the driver node. This partial result is some
model, for instance, an array of numbers. Condition can be some sort of convergence
of the sequence of guesses or reaching the maximum number of allowed iterations.
No comments:
Post a Comment