Developer, tinkerer, thinker

Laziness, Ruby, reduce and the map that wasn’t

March 6th, 2014

The other day, I was writing some Ruby code to go through a list, apply a function to it, and return the first element that the function didn’t return nil for. A good example would be matching a RegExp, and returning the MatchData for the first result element of the array that matched.

There is a Enumerable#grep function, which almost does this, but not quite. Other than that, there didn’t seem to be a function.

Now, you could do this with a map, followed by a find, but that seems inefficient since you only want the first non-nil result. Probably a first world programming problem, but it got me thinking if there was a way to do this efficiently, simply and idiomatically.

Hoogle

Hoogle is a great tool for looking for Haskell functions based on their type signature. I figured if there was a Haskell function for this, I wouldn’t be going crazy and it might give me an idea of what the name is called. In this case, we want a function that takes a list of something, a function that converts that something into either None or Maybe something else (equivalent to returning nil or something in Ruby, but in a statically-typed way). The Haskell signature is therefore [a] -> (a -> Maybe B) -> Maybe B.

Search for that on Hoogle, and you don’t get any useful results. So am I crazy for wanting it? Well, no.

Laziness

Because Haskell is lazy, map followed by find is efficient, because the map will only be run as many times as needed to get output from find.

This reminded me of a more general problem with Ruby – its reduce function is strict (not lazy), so it always consumes the whole list, even if you don’t need it to. So, since this list function can be written as a reduce, if we had a lazy reduce, we would have our solution.

Lazy reduction

Let’s take a simple definition of reduce, and rewrite it to be left associative and lazy.

def reduce(initial, &block)
  each do |x|
    initial = block.call(initial, x)
  end
  initial
end

Okay, so how do we make this lazy? The block normally gets the initial value (or memo value), and the next value, and returns a new memo value. Instead, we’ll get the next value in the list, and a block we can call that reduces the rest of the list (and returns initial on an empty list.)

def reduce_left(initial, &block)
  again = lambda do |list|
    if list.empty?
      initial
    else
      thunk = lambda do
        again.call(list.rest)
      end
      block.call(list.first, thunk)
    end
  end
  again.call(self)
end

Calling list.rest like this is inefficient, so we’ll rewrite it to use an Enumerator in a minute. Firstly, let’s see how we could use it.

Implementing map_first

Without further ado, the first non-nil result, or nil is:

def map_first(&block)
  reduce_left(nil) do |x, rest|
    block.call(x) || rest.call
  end
end

Note this has the short-circuiting OR structure we would expect.

An efficient implementation of reduce_left

We should use an Enumerator, so it would look like this:

def reduce_left(initial, &block)
  e = self.each
  again = lambda do
    block.call(e.next, again)
  end
  again.call
rescue StopIteration
  initial
end

And it’s even less code and fewer lambdas. Thus concludes the blog post.