Tuesday, April 28, 2009

Using inject for improved performance

I found myself wanting to know if an array had any truth values in it, and wrote a method that I was pretty proud of:
module Enumerable
  def any_true?
    inject(false) do |truth, item|
      truth || (yield item)
So now I can write
arr.any_true? {|item| some_method(item)}
Then I realized that a simpler way of expressing this is just
arr.map {|item| some_method(item)}.any?

The latter seems a lot better... its easier to understand, and uses just ruby primitives. But the first is actually still better for most cases. Why? Performance. Imagine some_method includes a database query or two... In the simple map.any? case we'll have to evaluate it for every item. But in the any_true? case we only evaluate until we find one, at which point we're done; we'll short circuit every other check and never hit the method again.

The difference can be enormouse. If I emulate this database requirement using a method that looks like:
def random_truth(likelihood)
    sleep 0.1
    rand < likelihood
I can now run some tests that demonstrate the extreme difference in runtime:
>> Benchmark.realtime {10.times { (1..100).to_a.map {|i| random_truth(0.5)}.any? }}
=> 100.000391960144
>> Benchmark.realtime {10.times { (1..100).to_a.any_true? {random_truth(0.5)} }}
=> 2.00004100799561

No comments:

Post a Comment