Archive – Page 5

Where Do Ruby Blocks Come From?

Credit: Xavi Cabrera on Unsplash

It’s time to get reflective…time for some deep introspection…so light a candle or two, put some Barry White on the stereo, get nice and comfortable, because we’re going to talk about Blocks.

Blocks in Ruby are powerful, and they’re used everywhere. You can pass blocks implicitly to enumerable methods like map and select simply by adding { ... } or do ... end after the method name. You can explicitly create closures (aka anonymous functions) by using proc, lambda, or ->() semantics and pass them around as method arguments or store them as variables. Yes, blocks are pretty magical.

But we’re not here to talk about how to write blocks per se or what they’re good for. We’re here to talk about where they come from.

What do you mean, where they come from? They come from programmers, silly!

Well, obviously…but I don’t mean how they originate in the minds of intrepid Rubyists everywhere — I mean where do they come from in the execution context of your Ruby program?

Oh OK. Yes, I’d like to know that too.

Awesome sauce! So let’s dive into what exactly happens when a block is first created.

Behold the Mighty Binding! #

When you write a block, you aren’t merely defining some lines of code that will get executed later. You’re also creating a binding. A binding is the execution context in which the block will eventually be executed. It consists of:

Bindings are actually instances of the Binding class, which means you can inspect the binding of any block’s Proc instance you have access to.

Wait wut?

Yep, in Ruby virtually everything is an object, even blocks (in the form of Proc instances). If you’re coming from a less dynamic language (say, Javascript), prepare to have your mind blown! Here’s an example:

abc = 123
block = proc {}
block.binding.local_variable_get(:abc) # 123

xyz = 987
block.binding.local_variable_get(:xyz) # NameError (local variable `xyz' is not defined for #<Binding>)

block.binding.local_variables # [:block, :abc, ...]

The reason the first variable (abc) was part of the binding and the second (xyz) wasn’t is because blocks inherit their parent scope at the point where the block is defined, but anything added in that scope later isn’t accessible from within the block. This is sometimes referred to as “lexical scope”.

However — and this is important to note — changes to existing variables are accessible. Check this out:

abc = 123
block = proc { puts abc }
block.binding.local_variable_get(:abc) # 123
block.call # 123

abc = 456
block.binding.local_variable_get(:abc) # 456
block.call # 456

The binding stores references to local variables by name (i.e., it doesn’t copy any variables), so when your variable was reassigned with a different value later, the block could still access the new value.

Of course, you can also reassign variables within a block and the new values are accessible outside the block:

abc = 1
proc { abc = 2 }.call

puts abc # 2

But what if you want to “hide” a local variable from inside a block — i.e., the variable isn’t included in the block scope and changes don’t propagate back up to the parent scope? Never fear! You can create what are called “block-local variables” by listing them in the block’s parameter list, preceded by a semicolon:

abc = 1
proc do |;abc|
  puts abc.nil? # true
  abc = 2
  puts abc
end.call # 2

puts abc # 1

So that’s pretty cool! But here’s where things get a little confusing: if you call binding.local_variable_get(:abc) on the proc even when specifying abc as a block-local variable, you get back the value 1, not nil. I guess that’s because the binding is telling you about the context the block has been bound to, not necessarily what the exact state of affairs will be inside the block. If you know of a way to introspect block-local variables through the Binding object, please let me know!

It is Better to Give Than to Receiver #

Another thing to take a look at is the receiver of the binding. Any method calls you make implicitly or explicitly to self, as well as any instance variables you access, will all be bound to the receiver. When you’re testing this out in IRB or in a basic Ruby script, the receiver will be the main object (unless you are inside of another object). Here’s an example:

block = proc {}
puts block.binding.receiver # main

class MyObject
  def receiver
    block = proc {}
    block.binding.receiver # return current scope's self, aka MyObject instance
  end
end

puts MyObject.new.receiver.class # MyObject

Now here’s where things get really trippy. Ruby lets you change the receiver of a block! Yep, that’s right: you can swap one receiver out for another and when you execute the block its self will be different than the originally bound receiver.

abc = 123

block = proc do
  puts abc
  puts self.xyz # explicit so you can see what's going on
end

block.call # NoMethodError (undefined method `xyz' for main:Object)

class MyObject
  def xyz
    456
  end
end

obj = MyObject.new
block.binding.receiver = obj
block.call

Run that and…oh wait, oops, that doesn’t work! 🙁 That’s because there is no receiver= method for binding like you might expect. Fortunately, there’s another way to go about things (in Ruby there almost always is!). We can use the instance_exec method of the object itself. Let’s fix the code and try again:

abc = 123

block = proc do
  puts abc
  puts self.xyz
end

block.call # still causes an error…but wait!

class MyObject
  def xyz
    456
  end
end
obj = MyObject.new

# Time to try out instance_exec!
obj.instance_exec(&block)
# output: 123
#         456

That works! 😃👍

So using instance_exec is very similar to using call, only you pass the block in as the first argument (just make sure to include the ampersand in front of the block variable). Any additional arguments will be passed to the block itself, same as any arguments you would give call. When you use instance_exec to execute the block proc, it’s then able to access the xyz method of obj — whereas before there was no xyz method available. In addition, even if you use instance_exec, the block still has access to the original local variables (abc) as part of its binding.

If you wanted to get really fancy with Ruby-fu metaprogramming, you could store the block’s bound receiver in a variable, then use instance_exec in combination with method_missing so that method calls in the context of the new object would actually end up shadowing those original receiver’s methods. Why on earth would you want to do this? Let’s just say I have a story to tell you…but we’ll save that for a later date. 😄

Summary #

So there you have it: Ruby blocks are fun and weird and can do so much, yet they ask so little of us in return. May we learn to appreciate how much work they have to do under the hood to make it all seem so easy. And now that you know more about bindings, lexical scope, and the wizard-like power of instance_exec, you too can have precise control over exactly what’s going on as you wield (and yield) procs and lambdas like a mahōtsukai. small red gem symbolizing the Ruby language



Why the Release of Ruby 3 Will Be Monumental

Credit: Johannes Groll on Unsplash

We’ve been living in the shadow of Ruby 2 for seven years now. Seven! Ruby 2 was released in 2013 (which incidentally is the same year as the initial public release of React 0.3.0!).

In that span of time, Ruby performance has improved significantly and many, many enhancements to the language have benefited a great many people and projects. We’ve seen companies using Ruby and in many cases Rails become bedrocks of developer and consumer internet infrastructure. GitHub. Shopify. Stripe. Square. AirBnB.

But there has also been some consternation along the way. Is Ruby really a top-tier programming language able to compete with the likes of Javascript, Python, PHP, Go, and beyond? Or was it just a DHH-fueled hype-cycle doomed to inevitable relative obscurity as other technologies and frameworks ascended in its wake? (I don’t actually believe anyone seriously thinks this any more, but you still see the stray head-scratcher whiz by on Hacker News.)

Now we are mere weeks away from a major new Ruby release: version 3. While Ruby 3 is an exciting update with lots of features that make it interesting both now and in the future with various point updates promising even more goodies, I think it’s the psychology of turning over from major version 2 to 3 that is most vital to the future health of the community.

Ruby 3 isn’t just a new version. It’s a new era.

What does this era represent? Let’s list a few talking points I hope we’ll start to push hard and often as Rubyists:

Ruby 3 is Fast #

No, I don’t mean Ruby 3 suddenly got a whole lot faster than Ruby 2.7. I mean that Ruby 3 is fast compared to Ruby 2. It’s unfortunate that much of the “Ruby is slow” meme has been a laggard perspective stemming from people’s experiences years ago with the language, or an old version of Rails, or Jekyll, or…the fact is it just wasn’t the zippy experience we’re pleased to enjoy today.

Do we still want even better performance? Of course! But at this point, Ruby is plenty fast as compared to many other “scripting” languages. Most of the time it’s on par with Python. It’s even on par with Javascript. (What? Don’t believe me? Check out how similar Jekyll and Eleventy perform as static site generators.) And as Nate Berskopec often reminds us, your Rails app can perform quite well with just a bit of fine-tuning, and often the typical bottlenecks lie elsewhere in the stack (database, web server, etc.)

Ruby 3 is Easy #

These days, you don’t need to wrestle with gem dependency hell or pray to the gods to get Ruby or a Ruby extension to compile. That was “old Ruby”. New Ruby is using a fancy-pants version manager like rbenv combined with Bundler 2.

It just works.

Truly, Ruby is the first thing I install on any new Mac or Linux machine I operate and getting things set up is a piece of cake. Installing Rails. Installing Bridgetown. Installing…whatever. It. Just. Works.

We also have things like Docker and WSL to make things much easier to accomplish on Windows machines if you get stuck wrestling with Win-native Ruby. Heck, you can upload your entire dev environment into the cloud now and use VSCode with remote extensions.

Are there ways Bundler and the ecosystem around Ruby versions/dependencies could be improved? No doubt. But it’s in no way any more complicated or fiddly than the world of npm/yarn, and you don’t see the angry hordes trying to burn down the barn doors over there (except maybe the Deno folks 😉).

Ruby 3 is Sleek #

Ruby isn’t the best choice for all problem domains. It just isn’t. But when it comes to “standard” web development, it often is the best choice. It really is! Spend a few days writing NestJS + TypeORM Typescript code and then come back to Rails. It’s like a breath of fresh, sweet air. And that’s not just when you’re writing controllers or models…it goes all the way up and down the stack.

Ruby just makes everything better. Less code. Less boilerplate. Less ceremony. More streamlined. More properly object-oriented. More polished and pleasurable to read and write. Certainly one could posit there are other web frameworks/languages which have much going for them as well. Laravel is popular with PHP devs, and for good reason. Django is popular with Pythonistas. But can anyone say with a straight face that, all things being equal, PHP is a “superior” programming language to Ruby? Can anyone say that Python—taken as a whole—is more suited to building a website than Ruby is?

I think not. While Ruby wasn’t originally invented as a way to supercharge web development, it found its niche in the rise of such amazing projects as Rails, Rack, Jekyll, plus great APIs by Stripe and many others. It rode much of the early wave of Web 2.0 hits, and that heritage continues to benefit us today.

Ruby 3 is Here to Stay #

Ruby 3 isn’t just another notch on the belt of recent Ruby releases. It’s Ruby 3.0. That means we can look forward to 3.1, 3.2, 3.3, and beyond. This is the beginning of a whole new era. New innovations. New patterns. Exciting ideas fusing concepts from other technologies with The Ruby Way. Fresh blood coming into the ecosystem. (Anecdotally, I’m seeing newbies plus returning old-timers jumping into Ruby-based forums and chat rooms all the time, and the pace of interesting new Ruby gems bursting onto the scene finally seems to be increasing after a few years of ho-hum incremental progress.)

The takeaway is this: Ruby 3 represents a moment when we should stand proud as Rubyists and unabashedly proclaim to the bootcamps and engineering departments of the world that we’re open and ready to do business large and small. Sure you could pick something other than Ruby with which to build the next great internet success story. But you’ll definitely be in good company if you do pick Ruby. After all, it’s more likely than not your code will be living in a repository overseen by Ruby (GitHub), you’ll be communicating with your fellow colleagues via Ruby (Basecamp & HEY), you’ll be asking for support via Ruby (Discourse forums), you’ll be researching the latest developer news and techniques via Ruby (Dev.to), and you’ll be spinning up your dev machine while wearing that l33t geek t-shirt you got from an indie vendor via Ruby (Shopify)—that is, after you paid for it via Ruby (Stripe). And when you’re exhausted from all that coding and need to unwind at a private cottage by the beach, Ruby will help you out there too (AirBnB).

Excelsior!



Gsub Blocks, Partitions, and StringScanners, Oh My!

Credit: Tara Evans on Unsplash

It should come as no surprise that Ruby gives you a lot of flexibility right out of the box when it comes to manipulating text. After all, it originated in the 90s when Perl was on the ascension, and Matz took inspiration from that language which is famous for its text processing prowess.

I’ve needed to do a fair bit of parsing work lately, and as part of that I’ve become more familiar with some of the ins and outs of using Regular Expressions to seek through text to find and possibly replace tokens. This is by no means an exhaustive resource, but it should provide you with a general idea of what’s possible in your day-to-day Ruby programming.

Gsub #

If you need to do a search and replace in one or more places throughout your string, gsub is typically the way to go. I think most Rubyists will discover this method pretty early on when learning about string manipulation.

What I didn’t know until recently is you can pass a block to gsub. For each match in the string, the block will be evaluated and the return value will be the replacement for that match. This means you can write code that will determine the replacement values conditionally based on what exactly is getting matched!

For example, if you wanted to change <div> tags to <span> tags, but only if there are no attributes, you could write something like this:

"<div>This is a string</div>" \
"<div class='centered'>This is another string</div>"
  .gsub(/(<.*?[ >])(.*?)(<\/.*?>)/) do |match|
    if $1.end_with?(" ")
      match
    else
      "<span>#{$2}</span>"
    end
  end

# <span>This is a string</span><div class='centered'>This is another string</div>

(Now this isn’t a great example because it doesn’t handle nested tags, but you get the idea…)

In case you’re not familiar with capture groups, the $1 and $2 are referencing the first capture group which is an opening tag (aka <div>) and the second capture group which is the text inside the tag.

gsub also lets you provide a hash where matches will be replaced by the values of matched keys:

"Foo is the nicest bar you'll ever meet."
  .gsub(/Foo|bar/, "Foo" => "Joe", "bar" => "guy")

# Joe is the nicest guy you'll ever meet.

I suspect the block syntax is ultimately of more value though.

Partition #

The partition method lets you divide a string into three pieces: the part of the string before a single match, the match itself, and everything that comes after that match. If you include capture groups in your regular expression, you can utilize those as well. One way you can take advantage of this type of data is by using partition to search a string for tokens, and build a new string up via a buffer as you transform the tokens.

Let’s say you want to be able to put colons around words where you’d like the word length to appear as a kind of footnote after the world. You want text :like: this to turn into text like(4) this.

Here’s how you could write it using partition, a buffer, and an until loop:

string = "This is :something: you'll :want: to try :out: for yourself."
buffer = ""

until string.empty?
  text, token, string = string.partition(/ :(.*?): /)

  buffer << text

  if token.length.positive?
    buffer << " #{$1}"
    buffer << "(#{$1.length}) "
  end
end

puts buffer

# This is something(9) you'll want(4) to try out(3) for yourself.

Now, is this something you could do with a gsub block as described previously? Yes indeed:

string = "This is :something: you'll :want: to try :out: for yourself."
string.gsub!(/ :(.*?): /) do
  " #{$1}(#{$1.length}) "
end

puts string

In fact that’s a lot simpler. However, in this example you don’t have access to any of the text before or after the token. If that’s something that’s important to you (maybe you need to process the token differently depending on what comes before it, or after it), you’ll want to use partition.

Or will you?? There is another way!

StringScanner #

Using StringScanner is like bringing a bazooka to a paintball tournament. It’s extraordinarily powerful, but it can also land you in some serious trouble—not to mention get a little mind bend-y if you’re not careful.

StringScanner is actually the name of a Ruby class in the standard library (stdlib), which you’ll need to import by adding require "strscan" to the top of your code. You use it by instantiating a scanner with a string, and then you use various methods to scan the string for patterns and advance a “pointer”.

Let’s say you want to replace “cake” with “pie” in a string, but not if the keyword is preceded by “short” or if it’s followed by “pops”. We’ll use a buffer and do string replacement like in previous examples, but because we have all the benefits of a scanner it’s pretty easy to look backwards and forwards and determine our next course of action.

require "strscan"

string = "Let them eat cake and then more shortcake and finally cake pops!"
scanner = StringScanner.new(string)
buffer = ""

until scanner.eos?
  portion = scanner.scan_until(/cake/)
  if portion.nil?
    buffer << scanner.rest
    scanner.terminate
    next
  end
  unless scanner.pre_match =~ /short$/ or scanner.check(/\s+pops/)
    buffer << portion.sub(/cake/, "pie")
  else
    buffer << portion
  end
end

puts buffer

# Let them eat pie and then more shortcake and finally cake pops!

Whoa, what’s going on here?

First, we set up an until scanner.eos? loop. This means the loop will iterate until we’ve reached the end of the string.

The scan_until method looks for a pattern and advances the current pointer to that location. (You can verify this by adding puts scanner.pointer below scan_until.) It returns the portion of the string that matches the pattern, so we can use that to perform string substitution to change “cake” to “pie”.

However, we don’t want to do the substitution if cake is preceeded immediately by “short”, so we’ll check for a regex match on everything that’s come before the portion (scanner.pre_match) to see if it ends with “short”. We also want to check if the very next part of the string is the word “pops”, so we’ll use the scanner.check method. This checks what comes immediately next in the string, but it doesn’t advance the pointer. (There’s also a check_until method which is analogous to scan_until.) By not advancing the pointer, we avoid messing up our position in the string and can continue looping normally.

The if portion.nil? block near the top of the loop handles the case where there are no more instances of “cake” in the string but there’s still more to the string we need to account for. By adding the .rest of the string to our buffer and calling scanner.terminate, we force the scanner to advance to the end of the string, in which case until scanner.eos? will evaluate true and end the loop.

This example is fairly simple because it’s only changing a single word to another word, so the substitution itself doesn’t require any fancy regex. But combine StringScanner with all of the techniques we’ve already learned (gsub blocks, even partition), and you’re able to build extremely sophisticated routines to handle nearly any kind of text processing imaginable.

Summary #

Whew, that’s a lot to take in! Today you’ve leaned that gsub is much more than just a way to say that “a” should become “b”. By supplying a block, you have precise control over the replacement strings by first inspecting each match of the source string.

In addition, the partition string method lets you divide a string into pre-match, match, and post-match components—and by doing so over and over in a loop and using a buffer, you can transform a large and complicated string section-by-section.

Finally, for the most precise control over searching text for one or more tokens and performing elaborate search-and-replace actions based on the relationships those tokens have with the rest of the text, the StringScanner object is there just waiting to unleash its full power. Not only that, your code can benefit from previous techniques in the midst of using StringScanner for maximum Ruby text processing prowess. small red gem symbolizing the Ruby language



Adding and Merging ActiveRecord Relations

Credit: Michael Dziedzic on Unsplash

An intriguing scenario arose in a project I was working on. In this application, creators can create plugin presets. Presets themselves are associated with “banks” — aka a bank has_many presets. There are also actions that can be taken against either banks or presets: users generally will bookmark them or download them. Thus these actions have a polymorphic association with both presets and banks.

What I wanted to do was package up all the actions users had performed for banks and related presets created by a particular creator. The resulting ActiveRecord query I arrived at expresses that well. Could this be refactored in a more performant way? (For example, using only one SQL query instead of three?) Very likely. But as this code is run only infrequently for reporting reasons, it’s a reasonable tradeoff between performance and readability.

Here’s the code for the ItemAction.for_creator(creator_profile) method:

class ItemAction < ApplicationRecord
  belongs_to :actionable, polymorphic: true
  enum action_type: [ :bookmark, :download, :publish ]

  def self.for_creator(creator_profile)
    banks = creator_profile.banks.published.select(:id)
    presets = Preset.where(bank: banks).select(:id)

    where(actionable: banks + presets).merge(
      ItemAction.bookmark.or(
        ItemAction.download
      )
    )
  end

  # other code, etc.
end

First, we get the list of banks. We use a published scope to limit the list to only banks which have been made available to the public. Also, we only need the id field back, not all the fields in the database table.

Second, we get the list of presets for those banks, again pulling in only the id field.

The last line of the method is where things get interesting. We want actions where the actionable association brings in both the banks and the presets we’ve loaded, so we use the + operator to concatenate both result sets together. Once we have that relation, we can use ActiveRecord’s merge functionality to pull in additional scopes that specify we only want either bookmark or download actions. We don’t want any other actions (say, publish), because those are actions taken by creators, not end users.

And that does the trick! Yay! A brief but useful example of the expressive power of adding and merging ActiveRecord relations together to get a final result.



The Archival Benefits of Static Site Generators

2020 Update: While I’m all in now on Bridgetown, a modern fork of Jekyll, I’m leaving this up since you can apply many of these same principles to Bridgetown as well.

I’ve been on a nostalgia trip lately, poring over old snapshots of various sites and blogs I worked on in the past (stretching all the way back to 1996). Thank goodness for the Wayback Machine! But it’s gotten me thinking about the impermanence of the digital artifacts we create all the time as designers, developers, and content authors.

All the work you’ve put into that app, that blog post, that video, that Instagram story…blink and it’s gone! In some cases that’s by design. Content that expires and is quickly forgotten has become desirable in certain circles, the artform being all about its “in the moment”-ness.

But in the instances where you want to preserve your content for posterity, the options become challenging. Let’s focus on blogging for the sake of this discussion. I’ve run a number of blogs over the years, and I deeply care about preserving those—at the very least for myself but also for my children and their children (etc.). But the sad truth of the matter is that I’ve “lost” almost all of them. They’re either folders of PHP spaghetti code or SSI files (Server-Side Includes…remember those?) or WordPress installations scattered across multiple ancient backup drives—some of which are in formats or using connectors can’t even use any more. Plus in many cases the content itself doesn’t even live in those folders, but rather exists in old MySQL databases which I would have to track down, load up, and possibly convert in order to access any of that content!

Bottom line: I’m essentially forced to rely on the Wayback Machine to look up my old content, but not all of the posts and domains were properly archived—and on many of the pages that do work, image links are often broken. It’s hardly an ideal scenario.

There is a Better Way: Make Your Site Static #

Thankfully, I’m now building all of my sites (including this one!) in a completely new way. I’m using Jekyll, which is a Static Site Generator. What does that mean? It means the content for the site—both the blog posts and pages as well as all of the template & layout files, Javascript code, and stylesheets—lives entirely within a simple folder hierarchy and consists solely of plain text files (other than images and other media of course). No databases to install, no weird dynamic code to run on the fly. All you have to do is run jekyll build at the command line (or in my case gulp build ; jekyll build because I process the source SCSS and JS files with Gulp) and in seconds you get a _site folder with your complete website generated and ready to deploy and view anywhere. As long as you write your content in Markdown (.md) or HTML (.html), you’re golden.

But the special one-two punch of using a Static Site Generator such as Jekyll is the fact that you can save your site and all of its content into a version controlled repository. Once your site is stored in a Git repo, you have endless options for how you want to archive and protect your data. Not only do you have every version of your site archived within the repo (so you can “go back in time” to view past iterations of the site), you can easily store the repo in multiple places at once. All of my sites are stored both locally on my computer as well as “in the cloud” in Bitbucket or GitHub, and in some cases they’re also stored on DigitalOcean servers I’ve set up with custom web apps I use to manage the content files using WYSIWYG editing tools. If my computer is busted, my sites are safe online, and if the internet completely goes down, at least I still have my local copies.

Why is this all so important for archival purposes? Here are three big reasons:

  1. You can view your site without any special software. Just fire up the most basic web server imaginable, drop your _site folder into its root location, and your site is up and running. No PHP, Ruby, Go, Python, or any other server language or framework required. There is no step three!
  2. Your content is contained within the most future-proof formats possible. Markdown files are just plain text with minimal decoration. HTML is the most well-established and widely-supported data exchange format in history. JPEG images certainly aren’t going anywhere any time soon. It’s safe to say that (unless you build your site with a bunch of crazy client-side Javascript rendering such that nothing works until all your code runs) you’ll be able to load a web browser decades from now and your site will just work.
  3. Your content is automatically backed up in multiple contexts, consistently. If all of your content is “silo’ed” in a single MySQL database somewhere on some WordPress host, and that host goes down or their backup gets corrupted, you’re toast. Years of work, gone. (And let’s not forget the fact that WordPress sites are prime targets of hacker attacks on a daily basis!) However, if your content lives within Git repositories that likely exist in multiple locations simultaneously, the likelihood you’ll completely lose your repo and all that data is vanishingly small.

The Future is Static #

A lot of web developers are using the term JAMstack these days to describe static sites build with the latest generation of tools, because the word “static” got a bad rap back in the day when new “dynamic” tools such as MovableType or WordPress were taking over the world. But there’s nothing truly static about static sites built with tools such as Jekyll, Hugo, and many others.

I can use extremely sophisticated build processes to create fantastic website designs with tons of interactivity, and I can log into admin interfaces and use WYSYWIG editors if I want to to to manage content and publish updates at the click of a button. Using Jekyll doesn’t mean you have to hand-code every blog post in raw HTML and “FTP” it somewhere like in the old days. We live in a new age where static site generators are not only slick and amazing, but are in fact paving the way for the future of modern web development.

Static is dead. Love live static!

So to sum it all up, if you want to create blogs and websites that will stand the test of time, that will still be readable ten, twenty, probably even fifty years from now, that will not get buried in a stack of hard drives somewhere or lost in some database black hole on the internet, then you need to try out Jekyll (or one of its competitors). I guarantee you: once you go JAMstack, you’ll never go back!



Why Service Objects are an Anti-Pattern

I have been vocal from time to time in internet discussions regarding service objects and why I believe they are the wrong solution to a legitimate problem. In fact, not only do I think better solutions exist than service objects in the majority of cases, I maintain that service objects are an anti-pattern which indicates a troubling lack of regard for sound object-oriented design principles.

What if I told you that adding service objects doesn't make your Rails codebase any better

It’s hard to get such lofty points across in a random tweet here or comment there. So I decided to write this article and dig into some real-world code that illustrates my position precisely.

Quick Aside: If you read this article and still think I’m off my rocker, here’s another recent take on the subject (with code examples!) by Jason Swett that I think does a great job illustrating the issue.
 

So…what do I mean when I use the term anti-pattern? Here’s a reasonable description from StackOverflow:

Anti-patterns are certain patterns in software development that are considered bad programming practices. As opposed to design patterns which are common approaches to common problems which have been formalized and are generally considered a good development practice, anti-patterns are the opposite and are undesirable.

In order to demonstrate why I don’t like service objects, I’m going to look at some code I inherited from a past development team for a client project. I can’t go too much into context since this application is still in private beta, but let’s just say it’s a social platform where you can rate media (images or videos) and those ratings trigger certain callback-style actions such as updating algorithmic data and adding activities to various users’ timelines.

We have a pretty simple data model where a Rating object can be created in the database that belongs_to both a User object and a Media object (all these examples are shortened from the production files):

class Rating < ActiveRecord::Base
  belongs_to :user
  belongs_to :media
end
class Media < ActiveRecord::Base
  has_many :ratings
end

You get the idea. Now in order to handle an incoming rating from a user, the previous developer created a service object called MediaRating which gets called from the controller:

class MediaRating
  def self.rate(user, media, rating)
    mr = MediaRating.new(user)
    rating_record = mr.update_rating(media, rating)
  end

  def initialize(user)
    @user = user
  end

  def update_rating(media, rating)
    rating_record = @user.ratings.where(media: media).first
    if rating_record.nil?
      # do create stuff
    else
      # do update stuff
    end

    # do some extra stuff here like run algorithmic data processing,
    # add social activities to timelines, etc.
  end
end

And here’s the relevant controller code:

media = Media.find(params[:media_id])
rating = params[:rating].to_i
MediaRating.rate(current_user, media, rating)

Bear in mind that this code was originally written quite a while ago. These days, all the cool cats writing service objects have settled on a bit of formality in terms of the API presented, so if I were to rewrite this service object, I’d probably do something like this:

# add this to Gemfile:
gem 'smart_init'

class UserMediaRater
  extend SmartInit

  initialize_with :user, :media, :rating
  is_callable

  def call
    rating_record = @user.ratings.where(media: @media).first
    # etc.
  end
end

# updated command from the controller:
UserMediaRater.call(current_user, media, rating)

Problem Time #

Now this code doesn’t look so bad, right? It seems pretty clean and well-structured and easy to test. Well, the problem is that what you are seeing here is my polished up, greatly simplified version of this service object. The actual one in the codebase is 74 lines of spaghetti code with methods calling other methods which call other methods because the code to trigger algorithmic data processing and timeline updates and so forth is all shoehorned into this one service object. So actually, the flow is more like this:

Controller > Service Object > Rate Method > Update Rating > Some Other Update Method + (Run Algorithm > Refresh Related Data), then Invalidate Caches + Add Timeline Activities

So every time I open up the codebase fresh and want to look at the block of code that simply creates or updates a rating of a media object by a user, I’m forced to wade through a bunch of ancillary functionality to get at the basic code path.

Well, you might say, that developer obviously didn’t do a very good job writing the service object! They should have kept it simple and focused, and instead put additional processing code in other objects (maybe even other service objects!)

Now wait a minute! The whole reason we are told we need to extract code contained within standard Rails MVC patterns into service objects is because they help us break up complex code flows into standalone functions. But the problem is that there’s nothing to enforce that rule. Nothing! You can write a simple service object, no doubt about it. But you can equally write a complex service object containing a bunch of methods that quickly turn into spaghetti code.

What does this mean? It means the service object pattern has no intrinsic ability to make your codebase easier to read, easier to maintain, simpler, or exhibit better separation of concerns.

If a pattern can foster nearly any sort of programming style with a nearly infinite spectrum of simple to highly complicated, then it ceases to be a useful pattern and describes nothing specific to developers.

So What Should We Do Instead? #

When I’m preparing to write a fair bit of code that I know will have to process incoming data and either create or update records along with other related functionality, I typically start by writing a class method on the most appropriate model. Now hold your horses, I’m not saying this is a superior pattern. I’m saying this is where I begin, before I start looking for another pattern that might be a better fit.

Let’s take a look what what it might look like if rating media were done using a class method on Rating itself:

class Rating < ActiveRecord::Base
  belongs_to :user
  belongs_to :media

  def self.rate(user, media, rating)
    rating_record = Rating.find_or_initialize_by(user: user, media: media)
    rating_record.rating = rating
    rating_record.save

    # do some extra stuff here like run algorithmic data processing,
    # add social activities to timelines, etc.
  end
end

And the updated controller code:

media = Media.find(params[:media_id])
rating = params[:rating].to_i
Rating.rate(current_user, media, rating)

Now I’m already breathing a sigh of relief when I read this code, because putting the rating code directly in the Rate model ensures that the functionality is closer to the data structures that are most impacted by the code. Want to open up the codebase and find out how to rate something? Look in the Rate model! It’s very straightforward.

However…I’m ultimately still not happy with this code for one big reason. As a rule of thumb, I like to call instance methods and use Rails associations whenever possible. To me it’s code smell to sprinkle class methods all over the place and avoid using associations and standard OOP principles as intended. In this case, it seems weird to me that I can’t do something along the lines of @media.rate in the controller. After all, I’m loading up a media object and I want to rate it. Why isn’t there a clear interface to do that?

Concerns and POROs are Your Friends #

Once I’m convinced I need to start moving complex code out of a model class method, I’m going to want to find a better pattern than just stuffing a bunch of bits into various models’ instance methods. After all, the problems that come with fat models is why people recommend breaking code out into service objects in the first place!

But in reality, the downsides of fat models isn’t so much that you have a single object with a lot of methods, it’s that those methods (and presumably related unit tests) are all jumbled together in one file. What you really need is a way to keep bits of key functionality separated out from other bits of key functionality in terms of code comprehension, and then you need some sort of rule of thumb for which bits of code really should be relocated into separate objects altogether.

Let’s take a look at what we could do with this media rating business. First, I’m going to extract the chunk of code we’ve been wrestling with into a concern (which is just a slightly enhanced Rails version of the standard Ruby mixin). Let’s call this concern Ratable:

module Ratable
  extend ActiveSupport::Concern

  included do
    has_many :ratings
  end

  def create_or_update_user_rating(user:, rating:)
    rating_record = ratings.find_or_initialize_by(user: user)
    rating_record.rating = rating
    rating_record.save

    # do some extra stuff here like run algorithmic data processing,
    # add social activities to timelines, etc.

    rating_record
  end
end

The Media class now benefits as well, as we can take that has_many :ratings directive out and keep that contained within the new concern:

class Media < ActiveRecord::Base
  include Ratable
end

And the updated controller code:

rating = params[:rating].to_i
Media.find(params[:media_id]).create_or_update_user_rating(
  user: current_user,
  rating: rating
)

Ah, this is already feeling much better. All I have to do in the controller is find the media object and call a single instance method that’s clearly named as to what it does. It’s a friendly interface that feels Rails-y in the best possible way.

There’s still a problem though. This create_or_update_user_rating method is trying to do way too much. It makes sense to handle the database access here, but the algorithmic data processing and timeline updates seem like actions that should be triggered to happen after the fact and defined someplace else.

The standard Rails way would be to put this code into ActiveRecord callbacks. Now I have no problem with callbacks, and I’ll gladly use them if it feels like a reasonable fit. But in this case, the two main things that need to happen seem like totally unrelated bits of functionality that are only tangentially related to the particular media, rating, and user objects involved.

So let’s use this opportunity to do some proper domain modeling and move that extra functionality out of the concern and into other POROs. We’ll keep our create_or_update_user_rating method nice and simple by pointing to those new objects:

def create_or_update_user_rating(user:, rating:)
  rating_record = ratings.find_or_initialize_by(user: user)
  rating_record.rating = rating
  rating_record.save

  # Let's extract out additional functionality to POROs or relevant models.
  # Better yet, encapsulate these into background jobs?
  # Left as an exercise for the reader...
  Rating::Processor.run(rating_record)
  Timeline::Activities.add_for_rating(rating_record)

  rating_record
end

Now before you start to get twitchy there, Rating::Processor and Timeline::Activites aren’t more “service objects.” These are POROs (Plain Old Ruby Objects) that are modeled using carefully considered OOP paradigms. One object is what I call a “processor” pattern: it takes input, crunches some numbers, and then saves the output somewhere. The other is a collection pattern that manages adding and removing items and the consequence of those actions. Nothing fancy or original here, but that’s the point.

We could have attempted to use the service object pattern here instead, perhaps by refactoring UserMediaRater to call additional services objects such as ProcessNewRating and AddTimelineActivityForRating. But how is that any more readable or any more well-structured than using concerns and POROs? Instead of succumbing to a huge app/services folder filled with what are essentially functions, we can engage instead in real domain modeling to come up with class names, data structures, and object methods that are designed for readability and ease of use.

And that’s my final point: using concerns and POROs instead of service objects encourages better interfaces, proper separation of concerns, sound use of OOP principles, and easier code comprehension.

I’m out of time to talk about testing strategies, but if you’re worried that using concerns or more advanced POROs will cause additional problems with your tests as compared with service objects, here are a couple of useful resources:

There’s a lot more I could talk about regarding how model or controller-level Rails concerns combined with useful PORO patterns is a better fit than service objects in the vast majority of cases, so keep an eye out for future articles in this vein.

TL;DR: service objects are crappy and better solutions exist most of the time. Please use those instead. Thank you!

Send your thoughtful, rage-free responses to @jaredcwhite 😊



Use Ruby Objects to Keep Your Rake Tasks Clean

I’ve been inspired by David Heinemeier Hansson’s new YouTube series On Writing Software Well, because I think it’s positively delightful when somebody takes the time and care to walk through real-world, production code and discuss why things were done the way they were and the tradeoffs involved, as well as the possibilities for improving that code further.

Today, I want to talk about how to keep ancillary pieces of your infrastructure fairly clean and minimalist. In terms of Rails, one place I’ve seen where it’s easy to end up with “bags of code” that aren’t really structured or straightforward to test are Rake tasks.

Let’s look at a Rake task I recently refactored on a client project. We were using Heroku’s new Review Apps functionality, which allows every pull request on GitHub to spawn a new application. QA specialists or product managers are then able to look at that particular feature branch’s functionality in isolation, which is a good thing. However, the post-deploy rake task we had in place to make sure we were setting up the proper subdomains, SSL certificates, indexing data for search, etc., was getting increasingly unwieldy. It was just a big “bag of code,” and that to me was a sign some refactoring was sorely needed.

Let’s take a look at the before code (a few bits of private data have been changed to protect the innocent):

namespace :heroku do
  desc "Run as the postdeploy script in heroku"
  task :setup do
    heroku_app_name = ENV['HEROKU_APP_NAME']
    begin
      new_domain = "#{ENV['HEROKU_APP_NAME']}.domain.com"

      # set up Heroku domain (or use existing one on a redeploy)
      heroku_domains = heroku.domain.list(heroku_app_name)
      domain_info = heroku_domains.find{|item| item['hostname'] == new_domain}
      if domain_info.nil?
        domain_info = heroku.domain.create(heroku_app_name, hostname: new_domain)
      end

      key = ENV['CLOUDFLARE_API_KEY']
      email = ENV['CLOUDFLARE_API_EMAIL']
      connection = Cloudflare.connect(key: key, email: email)
      zone = connection.zones.find_by_name("domain.com")

      # delete old dns records
      zone.dns_records.all.select{|item| item.record[:name] == new_domain}.each do |dns_record|
        dns_record.delete
      end

      response = zone.dns_records.post({
        type: "CNAME",
        name: new_domain,
        content: domain_info['cname'],
        ttl: 240,
        proxied: false
      }.to_json, content_type: 'application/json')

      # install SSL cert
      s3 = AWS::S3.new
      bucket = s3.buckets['theres_a_hole_in_the_bucket']
      crt_data = bucket.objects['__domain_com.crt'].read
      key_data = bucket.objects['__domain_com.key'].read
      if heroku.ssl_endpoint.list(heroku_app_name).length == 0
        heroku.ssl_endpoint.create(heroku_app_name, certificate_chain: crt_data, private_key: key_data)
      end

      sh "rake heroku:start_indexing"
    rescue => e
      output =  "** ERROR IN HEROKU RAKE **\n"
      output << "#{e.inspect}\n"
      output << e.backtrace.join("\n")
      puts output
    ensure
      heroku.app.update(heroku_app_name, maintenance: false)
    end
    puts "Postdeploy script complete"
  end

  def heroku
    @heroku ||= PlatformAPI.connect_oauth(ENV['HEROKU_PLATFORM_KEY'])
  end
end

Whew! That’s a lot to wade through. Not only is the task getting pretty long at this point, there are certain dependencies between the blocks of code being executed that are difficult to ascertain just by a cursory examination.

Now let’s look at how I refactored this. First, I created a new class in the lib folder called HerokuReviewAppPostDeploy and extracted each block into a separate method. You’ll notice we are actually doing even more in this new object, such as connecting to the GitHub repository and getting the branch name of the pull request so we can put a Jira ticket number right in the review app’s subdomain. That requirement turned up right as I was in the middle of refactoring, so I was thankful I avoided an even larger bag of code!

Here’s the full class:

class HerokuReviewAppPostDeploy
  attr_accessor :heroku_app_name, :heroku_api

  def initialize(heroku_app_name)
    self.heroku_app_name = heroku_app_name
    self.heroku_api = PlatformAPI.connect_oauth(ENV['HEROKU_PLATFORM_KEY'])
  end

  def turn_on_maintenance_mode
    heroku_api.app.update(heroku_app_name, maintenance: true)
  end

  def turn_off_maintenance_mode
    heroku_api.app.update(heroku_app_name, maintenance: false)
  end

  def determine_subdomain
    new_subdomain = heroku_app_name
    pull_request_number = begin
      heroku_app_name.match(/pr-([0-9]+)/)[1]
    rescue NoMethodError; nil; end
    unless pull_request_number.nil?
      github_info = HTTParty.get('https://api.github.com/repos/organization/reponame/pulls/' + pull_request_number, basic_auth: {username: 'janedoe', password: ENV["GITHUB_API_KEY"]}).parsed_response
      if github_info["head"]
        branch = github_info["head"]["ref"]
        jira_id = begin
          branch.match(/WXYZ-([0-9]+)/)[1]
        rescue NoMethodError; nil; end
        unless jira_id.nil?
          new_subdomain = "#{heroku_app_name.match(/^([a-z]+)/)[1]}-wxyz-#{jira_id}"
        end
      end
    end
    new_subdomain
  end

  def determine_domain
    "#{determine_subdomain}.domain.com"
  end

  def setup_domain_on_heroku(new_domain)
    # set up Heroku domain (or use existing one on a redeploy)
    heroku_domains = heroku_api.domain.list(heroku_app_name)
    domain_info = heroku_domains.find{|item| item['hostname'] == new_domain}
    if domain_info.nil?
      heroku_api.domain.create(heroku_app_name, hostname: new_domain)
    else
      domain_info
    end
  end

  def setup_domain_on_cloudflare(new_domain, heroku_domain_info)
    key = ENV['CLOUDFLARE_API_KEY']
    email = ENV['CLOUDFLARE_API_EMAIL']
    connection = Cloudflare.connect(key: key, email: email)
    zone = connection.zones.find_by_name("domain.com")
    zone.dns_records.all.select{|item| item.record[:name] == new_domain}.each do |dns_record|
      dns_record.delete
    end
    response = zone.dns_records.post({
      type: "CNAME",
      name: new_domain,
      content: heroku_domain_info['cname'],
      ttl: 240,
      proxied: false
    }.to_json, content_type: 'application/json')
  end

  def setup_ssl_cert_on_heroku
    # install SSL cert
    s3 = AWS::S3.new
    bucket = s3.buckets['theres_a_hole_in_the_bucket']
    crt_data = bucket.objects['__domain_com.crt'].read
    key_data = bucket.objects['__domain_com.key'].read
    if heroku_api.ssl_endpoint.list(heroku_app_name).length == 0
      heroku_api.ssl_endpoint.create(heroku_app_name, certificate_chain: crt_data, private_key: key_data)
    end
  end
end

Not only does this new approach allow us to use an object to break out bits of functionality into single-purpose methods, but because certain methods require data generated by other methods, we can include those variables as method arguments (for example, passing new_domain explicitly into setup_domain_on_heroku).

So how does our Rake task look now? Much, much better:

namespace :heroku do
  desc "Run as the postdeploy script in heroku"
  task :setup do
    heroku_app_name = ENV['HEROKU_APP_NAME']
    post_deploy = HerokuReviewAppPostDeploy.new(heroku_app_name)
    begin
      post_deploy.turn_on_maintenance_mode
      new_domain = post_deploy.determine_domain
      heroku_domain_info = post_deploy.setup_domain_on_heroku(new_domain)
      post_deploy.setup_domain_on_cloudflare(new_domain, heroku_domain_info)
      post_deploy.setup_ssl_cert_on_heroku
      Rake::Task['db:migrate'].invoke
      sh "rake heroku:start_indexing"
    rescue => e
      output =  "** ERROR IN HEROKU RAKE **\n"
      output << "#{e.inspect}\n"
      output << e.backtrace.join("\n")
      puts output
    ensure
      post_deploy.turn_off_maintenance_mode
    end
    puts "Postdeploy script complete"
  end
end

It’s way easier to see the individual steps needed to go through the process of completing the review app setup, and through the use of setting a variable returned from one method and passing it along to another, the data dependencies between the steps are now clear. In addition, because HerokuReviewAppPostDeploy uses straightforward method names that describe exactly what’s going on, the explanatory need for code comments is greatly reduced.

You can use this extract-into-a-standalone-object technique for other “bag of code” areas of your application. Background jobs are another good example. I prefer to keep my Sidekiq workers very minimalist…a lot of the time I make sure they call a single method on a single model and that’s all.

I hope this was helpful in giving you some new ideas on how to improve your own codebase, based on live production code. Stay tuned for the next article in this series.



Swift for Javascript and Ruby Developers

Last week I had the privilege of presenting on the topic of learning Swift from the perspective of a developer currently familiar with Ruby or Javascript. I showed off some of the reasons why Swift is a pretty exciting language for those used to working with lightweight scripting languages, and I also demonstrated some example code that highlights similar functionality implementations across all three languages.

You can see the presentation slides here, and code examples are available on Github. If you yourself are a developer in one or more of these languages and have suggestions for further code examples or useful comparisons, please submit a pull request on Github and let me know!

Newer Posts Older Posts
Skip to content