Showing posts with label ML. Show all posts
Showing posts with label ML. Show all posts

Tuesday, March 22, 2016

Home Depot Kaggle competition started

Started working on Home Depot Kaggle competition. This competition requires a lot of text cleaning, before any significant improvement over benchmark can be done.
Running some cleaning, spell-checking, initial feature generation on my AWS Spark cluster with 33 nodes.
I might not be able to put a lot of effort into it, but I will make sure I make at least one submission with basic features.

Friday, June 5, 2015

Spark MLlib Review

I wrote up a little review of Spark MLlib - it can be found here (PDF).
Iterative methods are at the core of Spark MLlib. Given a problem, we guess an answer, then iteratively improve the guess until some condition is met (e.g. Krylov subspace methods). Improving an answer typically involves passing through all of the distributed data and aggregating some partial result on the driver node. This partial result is some model, for instance, an array of numbers. Condition can be some sort of convergence of the sequence of guesses or reaching the maximum number of allowed iterations.

It appears your Web browser is not configured to display PDF files. No worries, just click here to download the PDF file.