Batcher's odd-even merge based sorting network node partner calculation.
I couldn't find a closed-form formula for odd-even network node partner calculation. The only available implementations were recursive and not very elegant. Here is the code that was provided on Wikipedia.
So I decided to work out a simpler and more intuitive solution to odd-even merge-based sorting network partner calculation, and here it is:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Current version of Spark MLlib doesn't have multi-class classification with SVM, but it is possible to make multi-class classifiers out of binary classifiers. One easy way of doing it is with one-vs-all scheme. It is not as accurate as more sophisticated schemes, but it is relatively easy to implement and have decent results. Here is my implementation.
To test this multi-class classifier, we can try it on handwritten digit recognition problem.
Get hand-written digits data from here.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
val digits_train = sc.textFile("/data/pendigits.tra").map(line => DenseVector(line.split(",").map(_.trim().toDouble))).map( v => LabeledPoint(v(-1),Vectors.dense(v(0 to 15).toArray))).cache()
val digits_test = sc.textFile("/data/pendigits.tes").map(line => DenseVector(line.split(",").map(_.trim().toDouble))).map( v => LabeledPoint(v(-1),Vectors.dense(v(0 to 15).toArray)))
val model = SVMMultiClassWithSGD.train(digits_train, 100)
val predictionAndLabel = digits_test.map(p => (model.predict(p.features), p.label))
Accuracy is only 74% with 100 iterations. Maybe it can't get much better with this construction. A different way of constructing multi-class classifiers from binary SVM is to use pairwise (one-vs-one) schemes with some adjustments as described here and also another method described here.
Scikit-learn SVM classifier performs better out of the box (if used with RDF kernel accuracy is in high 90's), but the sklearn implementation is not scalable. Hopefully Spark MLlib will be able to beat this in future, when more sophisticated (high-level abstraction) ML pipeline API features comes online.
For comparison, here are some results with tree classifiers. With RandomForest (30 trees, Gini, depth 7) it goes up to 93%. Adding extra 2nd order interactions (Spark doesn't support kernels in classification yet, but here a simple feature transformation that adds second order feature interactions), and increasing allowed tree depth to 15, brings accuracy to 97%. So, there is a lot of room for improvement in multiclass to binary classifier reduction.