Wednesday, September 2, 2009

New Home

After a long period away from this blog as I took a 3 month vacation from programming to travel through Guatemala, I'm back. However, I will no longer be maintaining this blog on blogger; if you are interested in reading more content by me about ruby, rails, or startups please drop by

Tuesday, April 28, 2009

Using inject for improved performance

I found myself wanting to know if an array had any truth values in it, and wrote a method that I was pretty proud of:
module Enumerable
  def any_true?
    inject(false) do |truth, item|
      truth || (yield item)
So now I can write
arr.any_true? {|item| some_method(item)}
Then I realized that a simpler way of expressing this is just {|item| some_method(item)}.any?

The latter seems a lot better... its easier to understand, and uses just ruby primitives. But the first is actually still better for most cases. Why? Performance. Imagine some_method includes a database query or two... In the simple map.any? case we'll have to evaluate it for every item. But in the any_true? case we only evaluate until we find one, at which point we're done; we'll short circuit every other check and never hit the method again.

The difference can be enormouse. If I emulate this database requirement using a method that looks like:
def random_truth(likelihood)
    sleep 0.1
    rand < likelihood
I can now run some tests that demonstrate the extreme difference in runtime:
>> Benchmark.realtime {10.times { (1..100) {|i| random_truth(0.5)}.any? }}
=> 100.000391960144
>> Benchmark.realtime {10.times { (1..100).to_a.any_true? {random_truth(0.5)} }}
=> 2.00004100799561

Friday, April 17, 2009

Distributed computing and Ruby

At work, I've been spending a lot of time working on porting our log management and analysis system over from a dying mysql implementation to a new system build upon an open source distributed database (Hypertable) using distributed Map/Reduce jobs for a variety of summarization and analysis tasks.

I've been amazed and gratified to be able to do the vast majority of this work within the comfort of Ruby, my favorite language. While Hypertable is written in C++, it uses Thrift to provide ruby and other language access, and since the primary developers of Hypertable are at Zvents (a rails shop), they've created a Rails plugin called HyperRecord that allows us to access Hypertable almost identically to how you would access mysql with ActiveRecord.

This has resulted in the ability to make the front end application for our stats & logging infrastructure a standard Rails app. Access restrictions are somewhat different in Hypertable than in a full relational database (its easiest to think of as an ordered hash... lookups by key or for a range of keys are supported, but conditions are expensive), but for most of our developers its just another rails app to work with.

The second place where I've been amazed by how much I've been able to stick to ruby is in designing and running our batched Map/Reduce jobs. We're using a framework called Cascading for designing our scalable batch workflows, built in Java and sitting on top of Hadoop. For those who aren't familiar, Hadoop is an open source implementation of Map/Reduce, written in Java, and Cascading allows for a higher-level conceptual model for parsing, analyzing, and modifying your data using Hadoop.

Cascading provides a number of built in filtering, text processing, and arithmetic map/reduce operations built in, and thanks to the wonder that is jruby, we're able to arrange our workflows entirely within ruby using Cascading.jruby. Only when we need a special operation that can't be constructed from the built-ins do we have to dip into Java.

So if you've been itching to dip your toes in the open source distributed computing revolution, but have been reluctant due to the Java heavy nature of Hadoop, take a look at HyperTable and Cascading!

Sunday, March 15, 2009

A new appreciation for convention over configuration

At work, I'm working on revamping our logging infrastructure to use Hypertable (an opensource Bigtable clone) for storage, and Hadoop (an opensource Map/Reduce framework) for log processing.

I'm extremely excited about the potential these tools have for allowing us to do all sorts of ridiculously cool analyses on our logs, and at some point I'll write about some of the things we are doing with them. However, in the meantime I just wanted to share my renewed appreciation for the simplicity of convention over configuration that I've gotten used to with Rails.

You see, Hadoop is written in Java, and uses Apache Ant for builds. We're talking 60-100 line XML files for to configure simple example jobs. Why would anyone voluntarily set their system up this way? To paraphrase one of my coworkers... enterprise-quality appears to be roughly translatable as 'over-engineer everything'!

Tuesday, February 24, 2009

Flash of Insight about Prototypal Inheritance

I've always been confused about how JavaScript inheritance worked, and what
all of these prototype things actually are. Reading article after article after article about prototypal inheritance gave me some more details about how to use it, as well as opinions on its value, but the fundamental model was still missing. But on the train this evening I was reading through the Wikipedia article on Prototypal inheritance when suddenly I had a flash of insight! The reality is so much simpler than anything I'd imagined. Without further ado, here it is:

A prototype is an object to be copied. Thats it. When you create a new object in a prototypal language, it starts out as a copy of its prototype.

Its common to see javascript that looks something like:

function A(){};
A.prototype.method1 = function() {foo};
A.prototype.method2 = function() {bar};
A.prototype.method3 = function() {blah};

Now any new instance of A will have method1, method2, and method3 defined. I guess I'd always thought of this as some sort of magic. But really, its equivalent to

function B(){};
B.method1 = function() {foo};
B.method2 = function() {bar};
B.method3 = function() {blah};
A.prototype = B;
function A(){};

In other words... here's this object B, it has these methods. When you create an object of type A? Just copy B. So simple! So flexible! Why did I not understand this before?

There is, of course a caveat. In JavaScript, we aren't really copying the methods, as that would be incredibly inefficient. There is a hidden pointer (accessible in some browsers as A.__proto__) that points back to the prototype, and the runtime then takes care of delegation. But conceptually, its just a copy.

Sunday, February 1, 2009

Name change & feedback request

I realized that the original name I had for this blog (Ruby, Rails, and other coding thoughts...) was not turning out to be representative of what I was posting. I've been writing almost entirely Shoes related content, with a few divergences into process related things and other microframeworks.

So I've decided to change the name to something more representative of what I'm using this blog for: writing about the coding related stuff I'm doing for fun. Since I develop a Rails app for work (this app, company blog here), I've been doing very little playing with Rails outside of work.

The new name, 'Ruby Merriment', expresses more clearly what this is about. Having fun playing with my favorite programming language.

That said, one thing I'm excited about is trying to engage in an ongoing conversation with other people who are excited about Ruby. I'm not quite clear what sets of things I write about are interesting to other people; It might be interesting to explore some more Rails related ideas, if people want to talk about those. So give me some feedback: Do you like what you're reading? Are there other things you'd like to talk about? If you have a blog where you talk about coding, point me at it so I can take a look and respond.


Saturday, January 31, 2009

More Shoes Structure

When I first started using Shoes, it took me a while to get the various pieces set in my head. There's a couple of different levels to understand. There are a number of tutorials about layout, the most fundamental being Stacks and Flows. This is critical to understanding to design your GUI, but it wasn't enough for me to really understand what was going on.

After hacking around, reading through the source code, and writing some addons I think I'm starting to get my head around it, so here's an attempt to explain some more of the structure of Shoes.

The most basic piece is This is the method you use to open a window for a new Shoes application; hello world in shoes is something like. do
para "Hello World"

What this does is twofold... it instantiates a new Shoes::App object, and sets self inside the block to point to that object. The Shoes::App object provides all of the helper methods for creating different components, such as para, button, stack, flow, etc, and is responsible for actually rendering them. The objects themselves don't render, they just define a structure that the Shoes::App object will then render. This is why, when trying to do something in a module or class outside of the block, you need to pass in the app object, or access it through the app method on some already rendered object (all shoes objects have an app method to get back to the app they are rendered in).

How does the structure that Shoes::App builds up look? It is a tree format, that actually looks remarkably similar to the DOM in the web world.

At the root of the tree is the slot associated with the original Shoes::App object. This slot behaves like a flow, and can be accessed via the slot method on the Shoes::App object; within the block that would be self.slot. (This is important; after reading that the app object was a flow, I originally thought I'd be able to use all of the flow methods immediately on it, but you actually need to call them on self.slot.)

Every node in the tree has an accessor called parent and one called children. Lets examine what these look like in the hello world example above, by adding some debug output:

app = do
para "Hello World"
Shoes.debug "Parent is #{app.slot.parent.inspect}"
Shoes.debug "Children are #{app.slot.children.inspect}"

The console now shows:

which is pretty much what we'd expect. The root should have no parent, and the paragraph is pretty much the only element, so it should be the first child. What does that paragraph look like? Lets try do
p = para "Hello World"
Shoes.debug "Para Parent is #{p.parent.inspect}"
Shoes.debug "Para Children are #{p.children.inspect}"

Now we see:

The parent is pointing back to the app.slot (which we now see is not actually a flow, but an instance of the Shoes class, which flows inherit from.) And the children of a Para object actually contain the text of the paragraph.

Understanding this tree structure was one of the biggest breakthroughs for me, because it means that many of the things I know from manipulating the DOM with javascript in the web world are suddenly applicable to Shoes. I'm no javascript genius, but I have already climbed some of the learning curve there, and being able to apply that to Shoes was eye opening.

One of the things the DOM does really well (and the Shoes model should as well) is allow you to navigate, find, and manipulate things that have been generated someplace else. This is where I got the idea for ShoeQuery, and will prove extremely useful in my efforts to build a Shoes debugger.

I think this similarity could make Shoes the toolkit of choice for web developers coming to the desktop.