Wednesday, July 27, 2016

Awesome Akka Streams

Awesome Akka Streams:
 
val X = Source.actorRef[Int](0, OverflowStrategy.dropNew)
val Y = X.to(Sink.foreach(println))
val Z = Y.run()
Z ! "pretty cool"

Saturday, June 25, 2016

Weird Spark bug?

1.5.0-cdh5.5.0

scala> df.filter("ad_market_id = 4 and event_date = '2016-05-23'").show
+----------+------------+
|event_date|ad_market_id|
+----------+------------+
+----------+------------+


scala> df.filter("ad_market_id = 4").filter("event_date = '2016-05-23'").show
+----------+------------+
|event_date|ad_market_id|
+----------+------------+
+----------+------------+


scala> df.filter("ad_market_id = 4").orderBy("event_date").filter("event_date = '2016-05-23'").show
+----------+------------+
|event_date|ad_market_id|
+----------+------------+
|2016-05-23|           4|
+----------+------------+

Tuesday, March 22, 2016

Home Depot Kaggle competition started

Started working on Home Depot Kaggle competition. This competition requires a lot of text cleaning, before any significant improvement over benchmark can be done.
Running some cleaning, spell-checking, initial feature generation on my AWS Spark cluster with 33 nodes.
I might not be able to put a lot of effort into it, but I will make sure I make at least one submission with basic features.

Thursday, January 7, 2016

Merger trait - common functionality of merging networks

Merger trait along with three implementations:
trait Merger {
def merge(xs: Seq[Int], ys: Seq[Int]): Seq[Int]
}
trait MergeSorting {
self: Merger =>
def sort(xs: Seq[Int]): Seq[Int] = {
if (xs.size < 2) {
xs
} else {
val (lefts, rights) = xs.splitAt(xs.size / 2)
merge(sort(lefts), sort(rights))
}
}
}
view raw Merger.scala hosted with ❤ by GitHub

Saturday, December 5, 2015

Thursday, November 5, 2015

Monday, August 31, 2015

Clean tmux cheat sheet

Clean tmux cheat sheet.

Clean tmux cheat-sheet

By resources

sessions

list-sessions        ls         -- List sessions managed by server
new-session          new        -- Create a new session
kill-session                    -- Destroy a given session
rename-session       rename     -- Rename a session
attach-session       attach     -- Attach or switch to a session
has-session          has        -- Check and report if a ses is on server
set-option           set        -- Set a session option
lock-session                    -- Lock all clients attached to a session

windows

list-windows
new-window           neww       -- Create a new window
kill-window          killw      -- Destroy a given window
rename-window        renamew    -- Rename a window
choose-session                  -- Put a window into session choice mode
choose-window                   -- Put a window into window choice mode
select-window        selectw    -- Select a window
find-window          findw      -- Search for a pattern in windows
last-window          last       -- Select the previously selected window
move-window          movew      -- Move a window to another
next-window          next       -- Move to the next window in a session
previous-window      prev       -- Move to the previous window in a session
refresh-client       refresh    -- Refresh a client
respawn-window       respawnw   -- Reuse a wd in which a command has exited
rotate-window        rotatew    -- Rotate positions of panes in a window
swap-window          swapw      -- Swap two windows
unlink-window        unlinkw    -- Unlink a window
select-layout        selectl    -- Choose a layout for a window
select-prompt                   -- Open a prompt to enter a window index
set-window-option    setw       -- Set a window option

panes

list-panes           lsp        -- List panes of a window
kill-pane            killp      -- Destroy a given pane
pipe-pane            pipep      -- Pipe out from a pane to a sh command
resize-pane          resizep    -- Resize a pane
display-panes        displayp   -- Disp an indicator for each visible pane
select-pane          selectp    -- Make a pane the active one in the window
up-pane              upp        -- Move up a pane
down-pane            downp      -- Move down a pane
resize-pane          resizep    -- Resize a pane
swap-pane            swapp      -- Swap two panes
split-window         splitw     -- Splits a pane into two
join-pane            joinp      -- Split a pane and move an existing one into the new space
break-pane           breakp     -- Break a pane from an existing into a new window
capture-pane         capturep   -- Capture the contents of a pane to a buffer

clients

switch-client        switchc    -- Switch the client to another session
suspend-client       suspendc   -- Suspend a client
refresh-client       refresh    -- Refresh a client
lock-client                     -- Lock a client
list-clients         lsc        -- List clients attached to server
detach-client        detach     -- Detach a client from the server
choose-client                   -- Put a window into client choice mode
show-messages        showmsgs   -- Show client's message log
display-message      display    -- Display a message in the status line

server

server-info          info       -- Show server information
start-server         start      -- Start a tmux server
kill-server                     -- Kill clients, sessions and server
lock-server          lock       -- Lock all clients attached to the server

shell

run-shell            run        -- Execute a command without creating a new window
if-shell             if         -- Execute a tmux command if a shell-command succeeded
pipe-pane            pipep      -- Pipe output from a pane to a shell command

keys

bind-key             bind       -- Bind a key to a command
unbind-key           unbind     -- Unbind a key
list-keys            lsk        -- List all key-bindings

paste buffers

capture-pane         capturep   -- Capture the contents of a pane to a buffer
copy-buffer          copyb      -- Copy session paste buffers
delete-buffer        deleteb    -- Delete a paste buffer
list-buffers         lsb        -- List paste buffers of a session
load-buffer          loadb      -- Load a file into a paste buffer
paste-buffer         pasteb     -- Insert a paste buffer into the window
save-buffer          saveb      -- Save a paste buffer to a file
set-buffer           setb       -- Set contents of a paster buffer
show-buffer          showb      -- Display the contents of a paste buffer

By functionality

options

set-option           set        -- Set a session option
set-window-option    setw       -- Set a window option

list stuff

list-sessions        ls         -- List sessions managed by server
list-windows         lsw        -- List windows of a session
list-panes           lsp        -- List panes of a window
list-buffers         lsb        -- List paste buffers of a session
list-clients         lsc        -- List clients attached to server
list-commands        lscm       -- List supported sub-commands
list-keys            lsk        -- List all key-bindings

Typical usage

flags

e.g.   
session-like: [-AdDP]   
window-like: [-adkP]

arguments

[-c start-directory]
[-F format]

[-n window-name]
[-s session-name]
[-b buffer-name]

[-t : target-client, target-session target-window, or target-pane]

[-x width]
[-y height]
[command]

examples

tmux new-session -s <session-name>
tmux attach-session -t <target-session>
tmux load-buffer -b <buffer-name> <path>
tmux save-buffer -b <buffer-name> <path>
tmux show-buffer -b <buffer-name>
tmux paste-buffer -b <buffer-name> -t <target-pane>

Working with sessions

$ Rename the current session.

Working with clients

d Detach the current client.
( Switch the attached client to the previous session.
) Switch the attached client to the next session.
D Choose a client to detach.
L Switch the attached client back to the last session.
r Force redraw of the attached client.
s Select a new session for the attached client interactively.

Working with windows

c Create a new window.
, Rename the current window.
0 to 9 Select windows 0 to 9
l Move to the previously selected window.
n Change to the next window.
p Change to the previous window
w Choose the current window interactively
. Prompt for an index to move the current window.
& Kill the current window.
' Prompt for a window index to select
f Prompt to search for text in open windows.
i Display some information about the current window.

Working with panes

" Split the current pane into two, top and bottom
% Split the current pane into two, left and right
z Toggle zoom state of the current pane.
o Select the next pane in the current window
Up, Down, Left, Right Change to the pane relative of the current pane.
; Move to the previously active pane
q Briefly display pane indexes
! Break the current pane out of the window
m Mark the current pane (see select-pane -m)
M Clear the marked pane
C-o Rotate the panes in the current window forwards
{ Swap the current pane with the previous pane.
} Swap the current pane with the next pane.
x Kill the current pane
Space Arrange the current window in the next preset layout
M-1 to M-5 Arrange panes (preset layouts)

Copy mode Set vi (or emacs) mode with:

set-window-option -g mode-keys vi   <or emacs>
       Function                vi             emacs
       Back to indentation     ^              M-m
       Clear selection         Escape         C-g
       Copy selection          Enter          M-w
       Cursor down             j              Down
       Cursor left             h              Left
       Cursor right            l              Right
       Cursor to bottom line   L
       Cursor to middle line   M              M-r
       Cursor to top line      H              M-R
       Cursor up               k              Up
       Delete entire line      d              C-u
       Delete to end of line   D              C-k
       End of line             $              C-e
       Goto line               :              g
       Half page down          C-d            M-Down
       Half page up            C-u            M-Up
       Next page               C-f            Page down
       Next word               w              M-f
       Paste buffer            p              C-y
       Previous page           C-b            Page up
       Previous word           b              M-b
       Quit mode               q              Escape
       Scroll down             C-Down or J    C-Down
       Scroll up               C-Up or K      C-Up
       Search again            n              n
       Search backward         ?              C-r
       Search forward          /              C-s
       Start of line           0              C-a
       Start selection         Space          C-Space
       Transpose chars                        C-t

additional useful key bindings

bind-key -t vi-copy v begin-selection
bind-key -t vi-copy y copy-pipe "reattach-to-user-namespace pbcopy"

unbind -t vi-copy Enter
bind-key -t vi-copy Enter copy-pipe "reattach-to-user-namespace pbcopy"

bind-key -n C-S-Left swap-window -t -1
bind-key -n C-S-Right swap-window -t +1

bind-key -n S-Right next-window
bind-key -n S-Left previous-window

sources

view raw tmux.md hosted with ❤ by GitHub

Thursday, August 20, 2015

Google Deep Dream Generator

You all have heard of this Google Deep Dream now, so try this online image generator: http://deepdreamgenerator.com/

I thought this image might work well:


Friday, June 5, 2015

Spark MLlib Review

I wrote up a little review of Spark MLlib - it can be found here (PDF).
Iterative methods are at the core of Spark MLlib. Given a problem, we guess an answer, then iteratively improve the guess until some condition is met (e.g. Krylov subspace methods). Improving an answer typically involves passing through all of the distributed data and aggregating some partial result on the driver node. This partial result is some model, for instance, an array of numbers. Condition can be some sort of convergence of the sequence of guesses or reaching the maximum number of allowed iterations.

It appears your Web browser is not configured to display PDF files. No worries, just click here to download the PDF file.

Thursday, May 7, 2015

Batcher's odd-even merging network

Batcher's odd-even merge based sorting network node partner calculation.

I couldn't find a closed-form formula for odd-even network node partner calculation. The only available implementations were recursive and not very elegant. Here is the code that was provided on Wikipedia.

So I decided to work out a simpler and more intuitive solution to odd-even merge-based sorting network partner calculation, and here it is:

object PartnerOddEven {
/**
*
* Calculates partner in Batcher Odd-Even network.
*
* @param n node index: 0, 1, 2, 3, ... 2^d^-1
* @param l merge stage: 1, 2, 3, ... d
* @param p stage step: 1, 2, 3, ... l
* @return Returns partner node, or self (n) if no partner for this step
*/
def partner(n: Int, l: Int, p: Int): Int = {
assert(p <= l, "p should be at most l")
assert(p > 0, "p should be at least 1")
assert(l > 0, "l should be at least 1")
if (p == 1)
n ^ (1 << (l - 1))
else {
val (scale, box) = (1 << (l - p), 1 << p)
val sn = n / scale - (n / scale / box) * box
if (sn == 0 || sn == box - 1) n
else if (sn % 2 == 0) n - scale else n + scale
}
}
}

Also, here I put up a little interactive sorting network generator. Of course, I updated that Wikipedia article, to make it easier for learners :)

Here is the best performance analysis of this network that I could find.

Tuesday, May 5, 2015

Digit recognition with Multiclass SVM on Spark MLlib

Current version of Spark MLlib doesn't have multi-class classification with SVM, but it is possible to make multi-class classifiers out of binary classifiers. One easy way of doing it is with one-vs-all scheme. It is not as accurate as more sophisticated schemes, but it is relatively easy to implement and have decent results. Here is my implementation.

To test this multi-class classifier, we can try it on handwritten digit recognition problem. Get hand-written digits data from here.

// using https://github.com/Bekbolatov/spark/commit/463d73323d5f08669d5ae85dc9791b036637c966
import org.apache.spark.mllib.classification.SVMMultiClassWithSGD
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.linalg.Vectors
import breeze.linalg.DenseVector
val digits_train = sc.textFile("/data/pendigits.tra").map(line => DenseVector(line.split(",").map(_.trim().toDouble))).map( v => LabeledPoint(v(-1),Vectors.dense(v(0 to 15).toArray))).cache()
val digits_test = sc.textFile("/data/pendigits.tes").map(line => DenseVector(line.split(",").map(_.trim().toDouble))).map( v => LabeledPoint(v(-1),Vectors.dense(v(0 to 15).toArray)))
val model = SVMMultiClassWithSGD.train(digits_train, 100)
val predictionAndLabel = digits_test.map(p => (model.predict(p.features), p.label))
val accuracy = 1.0 * predictionAndLabel.filter(x => x._1 == x._2).count() / digits_test.count()
val scoreAndLabels = digits_test.map { point =>
val score = model.predict(point.features)
(score, point.label)
}
scoreAndLabels.take(5)
Accuracy is only 74% with 100 iterations. Maybe it can't get much better with this construction. A different way of constructing multi-class classifiers from binary SVM is to use pairwise (one-vs-one) schemes with some adjustments as described here and also another method described here. Scikit-learn SVM classifier performs better out of the box (if used with RDF kernel accuracy is in high 90's), but the sklearn implementation is not scalable. Hopefully Spark MLlib will be able to beat this in future, when more sophisticated (high-level abstraction) ML pipeline API features comes online.

For comparison, here are some results with tree classifiers. With RandomForest (30 trees, Gini, depth 7) it goes up to 93%. Adding extra 2nd order interactions (Spark doesn't support kernels in classification yet, but here a simple feature transformation that adds second order feature interactions), and increasing allowed tree depth to 15, brings accuracy to 97%. So, there is a lot of room for improvement in multiclass to binary classifier reduction.

Saturday, March 23, 2013

specialized memory

Who is smarter: a person or an ape? Well, it depends on the task. Consider Ayumu, a young male chimpanzee at Kyoto University who, in a 2007 study, put human memory to shame. Trained on a touch screen, Ayumu could recall a random series of nine numbers, from 1 to 9, and tap them in the right order, even though the numbers had been displayed for just a fraction of a second and then replaced with white squares.
I tried the task myself and could not keep track of more than five numbers—and I was given much more time than the brainy ape. In the study, Ayumu outperformed a group of university students by a wide margin. The next year, he took on the British memory champion Ben Pridmore and emerged the "chimpion."
The Brains of the Animal Kingdom http://online.wsj.com/article/SB10001424127887323869604578370574285382756.html

Sunday, December 3, 2006

Our galaxy: 1 out of ~125,000,000,000.

Milky Way Galaxy probably looks like this.
There are hundreds of billions of stars in a galaxy and there are hundreds of billions of galaxies out there.
Some estimate that there are ~40,000,000,000,000,000,000,000 stars. I don't know how to even comprehend such a huge number.
Others even came up with results that there are ~10 stars for every grain of sand on all of Earth's beaches.
Can I just say that the Universe is mindbogglingly huge and we are so insignificant on that scale?