Started working on Home Depot Kaggle competition.
This competition requires a lot of text cleaning, before any significant improvement over benchmark can be done.
Running some cleaning, spell-checking, initial feature generation on my AWS Spark cluster with 33 nodes.
I might not be able to put a lot of effort into it, but I will make sure I make at least one submission with basic features.