Preposterous! Egregious!: 2012

Sunday 18 November 2012

Maven: Now with Colour™ !

Maven.

I'm an command-line kinda guy. GUIs didn't event exist when I was a kid. When I use Maven it's often from the command-line. A curious thing happens when you run Maven from the CLI more than 3 times in a row. You slowly lose the ability to read those useful little symbols that we affectionately refer to as "the alphabet", your eyes begin to ache and you start to wonder who's really in control: you or your eyelids, the list oozes on. The reason for this is that you get a lot of information as the build runs, and more importantly it's all the same colour, usually bright white on black depending on your terminal config. Things get really hard to see and it takes a lot of precious concentration and brain-juice to decipher the results. Usually this is because you're searching for 1 particular line out of 100 or so with no real visual hints. I don't like it. My brain may well just be getting old or maybe I'm just working too much lately but I find that I don't really have much brainpower to spare, especially when you're in the middle of a problem with a large context. Don't like it.

So I wrote a script.
Get it here: https://gist.github.com/4104053

The great thing about Linux is that you can treat nearly everything as either a file (like /dev and /proc, awesome!) or a pipeline. By simply piping Maven output through a processor (while being careful not to block) you can apply colour all over the place by using some smart regex to insert strings that the terminal parses in order to allow user control over various terminal attributes, in this case: colour. (See ANSI escape codes on Wikipedia for more info.) Now I'm not the first person to take this approach, but the great thing is that because these scripts are so easy to write, anyone can create or customise one for their own personal needs.

My script uses said approach to do the following:

Clear the screen before running.
Highlight the name of each Maven phase and the plugin & goal responsible.
Colour the number of test passes, failures, errors.
Colour the test classes & methods of test failures and errors.
Colour Maven warnings & errors.
Highlight total build success / failure.
Remove that bunch of shit you get at the end of failed Maven builds. When Maven fails (or tests fail) it dumps a bunch of info that basically amounts to "Try using -e or -X else here's our website." Ok for beginners maybe but I don't need to see it all the time. I have "mvn -help" and Google if I need them.

Now that feature-set might not sound like much but makes a world of difference. It turns my frown upside down. Twice! (Hey...but that means-

Here are some screenshots:

Coloured Maven when things are happy

Coloured Maven when things go wrong

Thursday 1 November 2012

Rant: Stupid Maven

I'm back in the Java world again, and already I'm annoyed. The Ruby world isn't without its problems but Ruby problems are generally much easier to solve when they do appear. You generally don't end up wanting to punch the screen and scream "give my life baaaack!" (Thor's internals get me close though.)

Today's problem: Maven.
Now, I've been a big fan and advocate of Maven for years. Therefore, upon returning to the Java world I happily sought out my good friend Maven and started using it immediately with my new project. ...Only to have problem... after problem... after problem... So now I'm going to vent.

TO DEPLOY to a local instance of Nexus you need to plonk big blob of shit (ie. XML) in every single project POM file. Adding <distrubutionManagement> in user profile's settings.xml doesn't work. WTF! In Maven-land, any artifact can be deployed to any repository so long as the repo configuration allows it. A beauty of Maven is the interoperability of artifacts & repositories, this is, a project that can deploy to Maven Central can also deploy to an intranet Nexus or Artifactory; no code needs to change; the protocol is standard. Forcing the storage of repo settings alongside the project's build instructions doesn't seem very bright to me. It's nice... but how to build a project should depend on where a project's gonna go after its built (especially when there's no impact on build anyway). In reality they are separate concerns but the system doesn't allow that separation. If repo info could be stored in either project or user-profile settings so that people could decide based on the project circumstances then fine, but as it stands currently, it is infuriatingly not the case and not supported.

SO, NOW to make things work, I need to have the same parent pom for all of my projects that I plan to deploy to my local Nexus. The alternative would be to copy & paste the same 20 lines or so of XML into each project: umm no. What's annoying about this is a) it's something else to maintain (as opposed to settings.xml which wouldn't have to be deployed or available as an artifact), b) it introduces a new dependency (literally, not runtime) to all my projects now, great; and c) Maven poms use single-inheritence -- no mixins; to inherit from a different parent I have more stuffing around to do. Bad architecture. Annoying.

Parent POMs (multi-module or standalone) allow you to configure plugins, however those plugin settings aren't automatically inherited in child projects. For settings I think you need to redeclare the plugin in the child project (not 100%, I don't even care anymore). What really bugs me is that version specifications are not inherited. WTF GUYS! So if I have a multi-module project (and I tried this) where in I declare the version of a plugin in the parent pom, the child does not use the specified version of the project. OMG. I tried with <inherited>true</inherited>, I tried in both <plugins> and <pluginManagement>, it doesn't matter. Now I guess I'm supposed to declare the versions in as properties and specify plugin versions in every single child module instead? Not only am I averse to duplication for the obvious reasons, but now my pom files are all massive! Now I googled this and apparently this is deliberate because "you should always specify specific versions, inheriting versions is evil and disallowed", yada yada. Yep, well it's one thing in the context of isolated projects but in the context of multiple modules comprising a single project, it's the opposite. The modules are components that are meant to be built together to create a larger system. Declaring certain plugin versions at project-scope is entirely reasonable and has many advantages. Secondly, when you don't specify a plugin version, where does it come from? The build still works so it's coming from somewhere, right? It comes from the Maven super POM which is like a template that all poms inherit. In it, default plugin versions are specified left-right-and-centre, and they are inherited and used without explicit specification or "evil" consequences (although there are some unless the release plugin has been updated to hardcode all plugin vers in child poms during release - haven't used it in a few years). Double standards.

Finally XML. Come on. It's 2012. XML has always been a tedious, ugly, unacceptably verbose format. The reasons that made it appealing in 1998 are no longer valid in my opinion. Especially not in the context of pom.xml. One the best things about Maven is that you don't have to write a bunch of crap for every project in order to get it to build in an automated fashion; Maven allows you to simply say "I'm blah v1.0 and I need v2 of X and v3.4 of Y to build. See ya!" and it takes care of the rest so long as you follow its standards. Add a few dependencies though and add a single command (argLine) to surefire (the Maven testing plugin) and you're looking at over 100 lines of XML. Writing everything manually in Ant would only come to about 80 or so and it uses XML too! There was a project called Maven Polygot a while back that was supposed to allow the pom to work in different formats but it seems dead or at least progressing slowly enough that it is moot anyway. Progress is pretty generally slow in Maven-land. Point: In this day and age I resent having to use XML almost anywhere. Have you ever filled in a form by hand, a tax return, a car rego, and seen:
Phone Number: ________________ /Phone Number
I haven't. I really dislike XML and I'm glad it's finally starting to trend away. Look at how concise things could be:

pom.yml -- https://github.com/mrdon/maven-yamlpom-plugin/wiki
Gradle -- http://www.gradle.org/docs/current/userguide/artifact_dependencies_tutorial.html
Buildr -- http://stackoverflow.com/questions/1015525/why-use-buildr-instead-of-ant-or-maven and scroll down a little.

Rant.stop.

Wednesday 12 September 2012

Problems with at_exit{}, exit(), and RSpec

I had an interesting problem the other day, working on a Ruby project of mine. I ran my tests: (note: I have "rak" aliased to "bundle exec rake")

rak test

which internally expands to the equivelent of:

rak test:spec test:int

which runs my specs and integration tests in that order.

Then an odd thing happened. My specs failed but then the integration tests ran anyway and scrolled the spec failure off the screen to report happy success. After some digging around I discovered the following:

I'd had this test failure for a few commits without noticing.
My CI builds on Travis CI were all reporting success (although the failure message was there in the build logs should one manually check.)
Rake itself and RSpec's rake task were both fine.
Running my tests directly with the rspec CLI, I was getting an exit code of 0 on both success and failure.

The Problem

RSpec was returning an error/exit/status code of 0 despite test failure. It should be non-zero so that external processes like Travis CI and Rake can determine that something's gone wrong and react accordingly.

I'm going to cut a tedious story down to the result here. After investigation I learned that rspec worked as expected again when I avoided this piece of code in my tests: What this piece of code does is:

create a temporary directory the first time it's called
reuse that temporary directory on subsequent calls
remove the directory at the end of the process's lifecycle

Why does that affect RSpec returning non-zero on failure? Because RSpec itself doesn't run immediately; it wants to wait until all of your specs have been loaded first. The way it accomplishes that is it registers itself to run via an at_exit block and then calls exit when it's finished with your specs. Still sounds like there's no problem right? Well this is what happens from that point on...

RSpec finishes running tests and calls exit with 0 for success or 1 for failure.
Accordingly, Ruby creates an instance of SystemExit and plonks it into the $! global variable.
Now that the process is shutting down, my at_exit in the snippet above starts running.
My cleanup code (correctly, validly and legally) runs Ruby's FileUtils.remove_entry_secure to remove the temporary directly created during tests. This isn't a problem in itself.
Here's the gotcha: FileUtils.remove_entry_secure removes the directory and sets $! to nil to indicate that no exception occurred.
Ruby ends the process and sets the exit code to the result of $!.status which was lost in the previous step.

The Solution

That was the problem. Now what's the solution?
Simple. Just take care to preserve the exit status in your at_exit so that it ends with what it started with. Here is a helper method that I created: (You can also view this file directly on Github in a utility library of mine.)

Then the solution becomes simply use at_exit_preserving_exit_status instead of at_exit in first snippet, and everything works again! Happy days!

TL;DR: Conclusion

Be careful that you don't corrupt the value of $! when using at_exit. If you're not careful (or don't use a handy, safe function like presented above), then you can corrupt the exit status of RSpec in particular, and other libraries that work in a similar fashion.

I Have Returned

Over the past 4 months I've spent a lot of time overseas on holiday. I did a bunch of Asia with some mates, I visited India with my girlfriend. It was great fun and good to get away and have some new experiences, be in situations that you normally wouldn't (or even want to in some cases). I enjoyed myself and I'm glad I did it.

And now, I want no more of it for at least 6 months! I've been on 13 planes over the last 4 months. I'm tired of travelling. Now that I'm back home for good, I'm looking forward to finally being able to focus on work again.

Thus, I'll start giving this blog attention again. I have returned.

Monday 23 April 2012

Ruby Mutex Reentrancy

This morning I was making some Ruby code of mine thread-safe which is always fun. (I'm serious btw. I frikking love multithreaded programming!) In doing so I came across something that I found a bit surprising.

Consider the following snippet: Think it will work? Let's try...

<internal:prelude>:8:in `lock': deadlock; recursive locking (ThreadError)
 from :8:in `synchronize'
 from reentrancy.rb:5:in `block in '
 from :10:in `synchronize'
 from reentrancy.rb:4:in `'

Shocking!
Mutex is not reentrant. Wow. Ok. Let's try something else...

Let's change that Mutux into a Monitor and try again. Alrighty, let's put on fresh underwear and give it a whirl...

Monitor is reentrant.

Ah, the world makes sense again. If I had to code my own reentrancy I would've cried and hated Ruby a little bit. My love and faith in Ruby remains, yay!

Is There A Cost?

Nothing is free. Is there a performance penalty? Time for some benchmarks.

Here is a little benchmarking script that acquires and releases both a mutex and monitor 1 million times each: Benchmarking results:

                 user     system      total        real
Mutex        0.400000   0.000000   0.400000 (  0.406259)
Monitor      0.870000   0.010000   0.880000 (  0.864888)

Ouch, monitor takes over the double the time that mutex does. That's the trade-off.

What About JRuby

I'm curious, let's try JRuby too. We'll change bm to bmbm and fire it up.

Rehearsal ---------------------------------------------
Mutex       0.571000   0.000000   0.571000 (  0.539000)
Monitor     2.012000   0.000000   2.012000 (  2.012000)
------------------------------------ total: 2.583000sec

                user     system      total        real
Mutex       0.321000   0.000000   0.321000 (  0.321000)
Monitor     1.696000   0.000000   1.696000 (  1.696000)

Wow, Monitor is 5.3x slower when using JRuby!!! Hmmm, I suspect JIT just need more time to warmup. Here's a new benchmarking script with a big warmup: And the results:

> jruby --1.9 --fast reentrancy-benchmark-jruby.rb
Warmup #1/20
...
Warmup #20/20
                user     system      total        real
Mutex       0.357000   0.000000   0.357000 (  0.357000)
Monitor     0.768000   0.000000   0.768000 (  0.768000)

Ok, that's on-par with the MRI results. Mutex is fast off-the-bat with JRuby where as Monitor will be a lot slower at first then decrease to a little over double the speed of mutex.

Conclusion

Mutex: No reentrancy. Fast, less than half the speed of Monitor.
Monitor: Reentrancy. Slow, little over twice as slow as Mutex.

Wednesday 18 April 2012

Ruby JSON Libraries

Over the last 5 years or so I'd been away from the world of Ruby. I still used Ruby at work and home for various little things [cos it's awesome!] but that's quite different to living in it, especially seeing its community is one of the most fast-paced I've seen. So when I came back recently and I needed a JSON library, I went searching and found (what felt like) 100 different JSON libraries...

Long story short, I benchmarked them. Feel free to skip to the end of this post to just get the conclusion and be on your way.

What Was Used

NAME	VERSION	BUILD
MRI Ruby	1.9.3p125	ruby 1.9.3p125 (2012-02-16 revision 34643) [x86_64-linux]
JRuby	1.6.7	jruby 1.6.7 (ruby-1.9.2-p312) (2012-02-22 3e82bc8) (OpenJDK 64-Bit Server VM 1.7.0_03-icedtea) [linux-amd64-java]
"	1.7.0.dev	jruby 1.7.0.dev (ruby-1.9.3-p139) (2012-04-15 b4b38d4) (Java HotSpot(TM) 64-Bit Server VM 1.7.0_03) [linux-amd64-java]
OpenJDK	1.7.0_03	OpenJDK Runtime Environment (IcedTea7 2.1) (ArchLinux build 7.b147_2.1-3-x86_64) OpenJDK 64-Bit Server VM (build 22.0-b10, mixed mode)
Oracle Java	1.7.0_03	java version "1.7.0_03" Java(TM) SE Runtime Environment (build 1.7.0_03-b04) Java HotSpot(TM) 64-Bit Server VM (build 22.1-b02, mixed mode)
MultiJson	1.3.2	c73bc389fa1b0b1c0b8225ea77ff3e2dee312304

The following JSON libraries were tested:

NAME	GEM NAME	VERSION
Optimized JSON (Oj)	oj	1.2.4
YAJL	yajl-ruby	1.1.0
JSON JRuby	json-jruby	1.5.0
JSON Pure	json_pure	1.6.6
JSON gem	json	1.6.6
OkJson	okjson	Version packaged with MultiJson 1.3.2

I created a little app to benchmark each library that performs two functions 100,000 times and records the time taken. Said two functions are:

[Writing] Generates JSON for Ruby data structure:

{
  a: 2,
  b: (1..50).to_a, # i.e. an array of 1,2,3,4,5,6, ... ,49,50
  c: %w[asf xcvb sdfg sdf gfsd],
  d: {
    omg: 'hedfasgdsfg',
    wewr: 34,
    sfgjbsdf: %w[sdfg sdfgsdfgnj klj kj hkuih ui hu kjb bkj b sdfg],
  },
}

[Reading] Parses this JSON:

{"a":2, "b":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50], "c":["asf","xcvb","sdfg","sdf","gfsd"], "d":{"omg":"hedfasgdsfg", "wewr":34, "sfgjbsdf":["sdfg","sdfgsdfgnj","klj","kj","hkuih","ui","hu","kjb","bkj","b","sdfg"]}}

If you're interested you can grab the code here: https://github.com/japgolly/WebServerBenchmark/tree/json_libraries
You're free to read it, play with it, hack it, print it and eat it, make love to it; It's all good.

Finally, these tests were performed on a Q9550 with 8GB RAM running Arch Linux 64-bit.

➤ uname -a
Linux golly-desktop 3.3.2-1-ARCH #1 SMP PREEMPT Sat Apr 14 09:48:37 CEST 2012 x86_64 Intel(R) Core(TM)2 Quad CPU Q9550 @ 2.83GHz GenuineIntel GNU/Linux

Results: First Cut

The numeric axis represents number of seconds taken to perform 100,000 operations.
Less is better.

	Linear Graph	Logarithmic Graph
MRI
JRuby (on Oracle Java)

Wow. Two things are immediately obvious:

OkJson is extremely slow on both Ruby implementations. Whatever its design goals, performance isn't one of them.
JRuby runs YAJL like my grandmother runs marathons.

I really thought JRuby would perform better. Maybe 1.7 will be better. I know the JRuby team expect significant performance gains; I wonder if they've implemented them yet... Let's try using the latest dev version of JRuby 1.7.0!

Also I think I'll try using JRuby 1.6.7 with the OpenJDK implementation of Java and see if that gives better results.

Results: Lots of JRuby Love

[Update 2012-04-23: New JRuby library discovered, see conclusion.]

Alrighty, done. Here are the results: (note: using a logarithmic scale here again) And the same thing expressed differently:

Jeez, it's not getting much better for JRuby...

Ok, enough of this. Let's get rid of the council-working options [hey, Aussies get that!] and just look at the feasible ones.

Results: The Finalists

The numeric axis represents number of seconds taken to perform 100,000 operations.
Less is better.

Conclusion

If you use MRI, use Oj unless you generate more JSON than you parse, in which case use YAJL.

If you use JRuby, you've only really got one choice: json-jruby. You can expect roughly the same performance with OpenJDK, Oracle Java and the latest dev build of JRuby.

Update 2012-04-23: jrjackson for JRuby is approx 4x faster than json-jruby and faster than MRI. If you use JRuby, you want this!!

Monday 2 April 2012

I Have Blog

And I decided I would create a new blog, the first in nearly 6 years.

And I decided I would overcome the ennui of asynchronous communication with a strategy (!), seeing things with new perspectives won through experience now that I'm the ripe, hoary age of 31.

And I decided I would bless this new blog with an irrelevant excerpt I love from a brilliant saga called Malazan Book of the Fallen by Steven Erikson.

Her finger provided the drama, ploughing a traumatic furrow across the well-worn path. The ants scurried in confusion, and Samar Dev watched them scrabbling fierce with the insult, the soldiers with their heads lifted and mandibles opened wide as if they would challenge the gods.

I don't know why I love that so much, but I do. Hey! I did say it was irrelevant.