Back to All Posts

Elegant Memoization with Ruby’s .tap Method

There are a few different ways to memoize complicated chunks of code in Ruby. Here's why .tap is my personal favorite.

When I was writing the Jekyll integration for JamComments, I started reminiscing about some of the features I really like about Ruby (it had been a minute since I wrote much of it). One of the first that came to mind was the conditional assignment operator, often used to memoize values:

def results
  @results ||= calculate_results
end

If you're unfamiliar, the @results instance variable if will only be set if it's falsey. It's a nice way to ensure an expensive operation is performed only when it's needed and never more than once.

For one-liners like this, it's straightforward. But sometimes, a little more complexity may require multiple lines of code, like if you were to fetch results from an external service. In that case, memoization isn't as elegant. That's where another neat Ruby feature can help retain that elegance. But first, let's flesh out a scenario.

Scenario: Memoizing an HTTP Request

Here's a GitHubRepo class for fetching repository data from the GitHub API. It handles making the request and accessing particular data we want from the response.

require 'httparty'

class GitHubRepo
	attr_reader :name

	def initialize(name:)
		@name = name
	end

	def license
		repo.dig('license', 'key')
	end

	def stars
		repo['stargazers_count']
	end
    
    private 
    
	def repo
    	puts "fetching repo!"
        
		response = HTTParty.get("https://api.github.com/repos/#{name}")

		JSON.parse(response.body)
	end
end

Spin it up by passing in a repository name:

repo = GitHubRepo.new(name: 'alexmacarthur/typeit')

puts "License: #{repo.license}"
puts "Star Count: #{repo.stars}"

Unsurprisingly, "fetching repo" would be output twice, since the repo method is being repeatedly used with no memoization. We could solve that by more manually checking & setting a @repo instance variable:

# Inside class...
def repo
	# Check if it's already set.
	return @repo unless @repo.nil?

	puts 'fetching repo!'

	response = HTTParty.get("https://api.github.com/repos/#{name}")

	# Set it.
	@repo = JSON.parse(response.body)
end

But like I said, not as elegant. I don't love needing to check if @repo is nil myself, and then setting it in a different branch of logic.

Where .tap Shines

Ruby's .tap method is really helpful in moments like this. It exists on the Object class, and as the docs describe, it "yields self to the block, and then returns self." So, memoizing an HTTP response cleans up a bit better:

# Inside class...
def repo
	@repo ||= {}.tap do |repo_data|
		puts 'fetching repo!'

		response = HTTParty.get("https://api.github.com/repos/#{name}")

		repo_data.merge!(JSON.parse(response.body))
	end
end

Explained: We start with an empty hash {} as a "default" value, which is then "tapped" and provided to the block as repo_data. Then, we can spend as many lines as we want in that self-contained block building repo_data as desired before it's implicitly returned. And that block is behind a conditional assignment operator ||=, so future repo calls will just return the @repo instance variable. No variable checking. One code path. Slightly more cultivated, in my opinion.

But there's a potential kicker in there that's nabbed me a few times: .tap will always return itself from the block, no matter what you do within it. And that means if you want a particular value to be returned from the block, you have to mutate that value. This would be pointless:

repo_data = repo_data.merge(JSON.parse(response.body))

It would have simply reassigned the variable, and the original reference would have still been returned unchanged. But using the "bang" version of merge does work because it's modifying the repo_data reference itself.

What about a plain, old begin block?

Yep, something like this would definitely work, and there are some solid advantages to it, like less code and no mutations.

# Inside class...
def repo
	@repo ||= begin
		puts 'fetching repo!'

		response = HTTParty.get("https://api.github.com/repos/#{name}")

		JSON.parse(response.body)
	end
end

The reason I tend to prefer .tap is because (I feel like) it gives me a little more control over the shape of the object I'm building. In cases like this, there's nothing I can do to guarantee that the response body will be modeled in a particular way. Using .tap streamlines the building of my hash exactly how I want, and makes it easy to fall back to default values if certain properties aren't found.

# Inside class...
def repo
	@repo ||= {}.tap do |repo_data|
		puts 'fetching repo!'

		response = HTTParty.get("https://api.github.com/repos/#{name}")
		data = JSON.parse(response.body)

		# Shaping the hash exactly how I want it:
		repo_data['license'] = data.dig('license', 'key') || "unknown"
		repo_data['stargazers_count'] = data['stargazers_count']
	end
end

Not to mention, by starting with that empty hash, we're guaranteeing that the request would never be performed again, even if the response resolves to something falsey.

That said, the distinction probably doesn't matter that much. Make your own choices.

Similar Methods in Other Languages

You can tell a feature is valuable when other languages or frameworks adopt their own version of it, and this is one of those. I'm aware of just a couple, but I'm sure there are more.

Laravel (PHP), for example, exposes a global tap() helper method, and there's even a Tabbable trait, which is used to add a tap method to several classes within the framework. Moreover, Taylor Otwell has even said its inspiration was found in Ruby. Here's a snippet robbed straight from their documentation:

$user = tap(User::first(), function (User $user) {
    $user->name = 'taylor';
 
    $user->save();
});

And here's how that helper could be used to memoize our GitHub API request. As you can see, it works nicely with PHP's null coalescing operator:

// Inside class...
private function repo()
    {
    	// Perform request only if property is empty.
		return $this->repo = $this->repo ?? tap([], function(&$repoData) {
			echo 'Fetching repo!';
    
			$client = new Client();
			$response = $client->request('GET', "https://api.github.com/repos/{$this->name}");

			$repoData += json_decode($response->getBody(), true);
    	});
	}

Kotlin's actually has a couple tools similar to .tap. The .apply and .also methods permit you to mutate an object reference that's implicitly returned at the end of a lambda:

val repo = mutableMapOf<Any, Any>().also { repoData ->
	print("fetching repo!")
    
	val response = get("https://api.github.com/repos/$name")
	jacksonObjectMapper().readValue<Map<String, Any>>(response.text).also { fetchedData ->
		repoData.putAll(fetchedData)
	}
}

But for memoization, you don't even need them. Kotlin's lazy delegate will automatically memoize the result of the proceeding self-contained block.

// Inside class...
private val repoData: Map<String, Any> by lazy {
	print("fetching repo!")
    
	val response = get("https://api.github.com/repos/$name")
	jacksonObjectMapper().readValue<Map<String, Any>>(response.text)
}

Sadly, God's language, JavaScript, doesn't have a built-in .tap method, but it could easily be leveraged with Lodash's implementation. Or, if you're feeling particularly dangerous, tack it onto the Object prototype yourself. Continuing with the repository fetch example:

Object.prototype.tap = async function(cb) {
  await cb(this);

  return this;
};

// Create an empty object for storing the repo data.
const repoData = await Object.create({}).tap(async function (o) {
  const response = await fetch(
    `https://api.github.com/repos/alexmacarthur/typeit`
  );
  const data = await response.json(response);

  // Mutate tapped object.
  Object.assign(o, data);
});

For memoization, this would pair decently with the new-ish nullish coalescing operator. Say we were in the context of a class like before:

// Inside class...
async getRepo() {
	this.repo = this.repo ?? await Object.create({}).tap(async (o) => {
		console.log("fetching repo!");
        
		const response = await fetch(
          `https://api.github.com/repos/alexmacarthur/typeit`
        );
        const data = await response.json(response);

        Object.assign(o, data);
      });

    return this.repo;
}

Still not the level of elegance that Ruby offers, but it's getting there.

A Gem in the Rough

Like I mentioned, it's been a little while since I've dabbled in Ruby, spending most of my time as of late in Kotlin, PHP, and JavaScript. But I think that sabbatical has given more comprehensive, renewed perspective on the language, and helped me to appreciate the experience it offers despite the things I don't prefer so much (there are some). Hoping I continue to identify these lost gems!

Thank you to Jason, a far-above-average golfer, who taught me Ruby tricks like this.


Alex MacArthur is a software engineer working for Dave Ramsey in Nashville-ish, TN.
Soli Deo gloria.

Get irregular emails about new posts or projects.

No spam. Unsubscribe whenever.
Leave a Free Comment

10 comments
  • Adrien

    Just use memo_wise or similar


    1 reply
    • Alex MacArthur

      I have no problem w/ that. But it’s not always desirable to introduce another dependency, and often simple enough to roll it yourself.


  • Lev
    # the cleanest way is to use Memery:
    include Memery

    memoize def repo
     response = HTTParty.get("https://api.github.com/repos/#{name}")
     JSON.parse(response.body)
    end

  • Ilya Sher

    Related: https://ilya-sher.org/2022/12/31/the-new-life-of-tap/


  • belgoros

    There is a tiny typo in this statement: "@results instance variable if will only be set if it's falsey", - should,'t it be rather "@results instance variable will only be set if it's falsey" ? (extra 'if' was removed)


    1 reply
  • Mikhail

    IMO instead of creating additional nesting it's much better to just do early return

    def fn
    return @fn if @fn

    @fn = ...
    end


    1 reply
    • Alex MacArthur

      “Much better” is pretty subjective, lol. But yes, I see the appeal of fewer indents. I personally like wrapping the expensive logic inside some sort of block and not needing to touch the instance variable as often.


  • Patrik

    You can avoid the mutation and still have total control:

    @repo ||= HTTParty.get("https://api.github.com/repos/#{name}").then do |response|
      data = JSON.parse(response.body)
      {
      license: data.dig('license', 'key') || "unknown",
      stargazers_count: data['stargazers_count']
      }
    end

    1 reply
    • Alex MacArthur

      Yep, that seems to be a pretty common preference people have. Might see myself moving toward it in the future.


  • Augusts Bautra

    .tap has tehe benefit of keeping everything about the memoisation in the block, but at the cost of an indent.
    An alternative approach I've recommended for years (and describe in https://dev.to/epigene/memoization-in-ruby-2835), is the "defined?" approach.


    1 reply
    • Alex

      I totally get the indent concern and why others opt for an alternative. I admittedly don’t have a huge history of Ruby experience, so I can see myself shifting in preference as time goes on. Good overview of options in that post, btw!


  • Michael Chui

    Can't say I see the benefit. The difference between your approach and using a begin-block is just that you add a method invocation of #tap. As Fabio says, using #then is also generally better, but it doesn't make a difference in your case. The way I'd do it is,

    @repo ||= HTTParty
    .tap { puts 'fetching repo!' }
    .then { _1.get("https://api.github.com/repos/#{name}") }
    .then { JSON.parse(_1.body) rescue {} }
    .then do |data|
    # Shaping the hash exactly how I want it:
    { 'license' => data.dig('license', 'key') || "unknown",
    'stargazers_count' => data['stargazers_count']
    }
    end

    But I don't generally use HTTParty anyways since it doesn't lend itself to this kind of composability. I'd prefer RestClient::Resource.new("https://api.github.com/repos/#{name}").tap { puts "fetching repo!" }.then { _1.get } for that bit.


  • Fabio

    Me again! Here’s what I think may be the most expressive way to write this. It shows a clear progression of the data being mutated and still gives the ultimate benefit of memoization using your technique. It’s a little bit like piping the output of one command through to another in *nix shells.

    def repo
    @repo ||= nil
    .tap { puts 'fetching repo!' }
    .then { HTTParty.get("https://api.github.com/repos/#{name}") }
    .then { |response| JSON.parse(response.body) }
    .then { |parsed| parsed || {} }
    end
    end


  • Fabio

    Cool idea!

    I wanted to mention that there is a similar method to #tap called #then. It also yields self to the block, but it returns the value of the block. So, in that case, using merge (instead of mutating it with merge!) would work. Hope that’s helpful!