Optimizing Rails for Memory Usage Part 4: Lazy JSON Generation and Final Thoughts

This is part four in a four-part series on optimizing a potentially memory-heavy Rails action without resorting to pagination. The posts in the series are:

Part 1: Before You Optimize

Part 2: Tuning the GC

Part 3: Pluck and Database Laziness

Part 4: Lazy JSON Generation and Final Thoughts

In this final post we will extend our discussion of laziness techniques to JSON generation, for which there is less native support than lazy loading database records. Then I will implore you to look for other small gains before ending with a rant on how fighting memory is pointless.

Lazy JSON generation

TL;DR

Enumerators don’t lazily serialize to JSON. Monkey-patch it.
Rails doesn’t stream JSON to the client. But it can be done.

In my previous post, we discussed techniques to make your processing pipeline lazy so you did not have to keep all your records in memory as you build a response. We learned how to fetch our records in a lazy enumerator instead of in one big array.

Unfortunately, Enumerators do not serialize to JSON arrays by default. All your laziness is lost if you have to call to_a on the enumerator: to_a will generate an array holding all your objects, none of which can be freed until the serialization is complete.

The solution is to monkey-patch Enumerator. Add this Enumerator#to_json definition to config/initializers/enumerator_to_json.rb and you can then lazily serialize your JSON just fine:

enumerator = [
  { key: "Value 1" },
  { key: "Value 2" }
].lazy

# Without Enumerator#to_json
JSON.pretty_generate(enumerator)
# SystemStackError: stack level too deep

# With Enumerator#to_json
JSON.pretty_generate(enumerator) 
# [
#   {
#     "key": "Value 1"
#   },
#   {
#     "key": "Value 2"
#   }
# ]

A Constant Memory, Pagination-free, Streaming Index View

With lazy loading from the database and the Enumerator#to_json method above, you don’t have to keep objects in memory after you serialize them to a JSON string. However, the entire serialized JSON response for the action still has to reside in memory. That is, once a record from your database turns into a JSON string and is appended to your JSON response, you don’t have to keep the record around anymore. GC will collect it. However, this long JSON response string for the client has to sit in memory. Rails will not begin sending it to the client until the entire response has been generated. If the number of records to serialize doubles, the amount of memory held by the response string will double even if your stream of records from the database uses the same amount of memory.

This non-constant memory profile suggests that maybe you should stream your JSON array to the client as you generate it. Rails can live stream responses, so you can use that ability to send your JSON as you generate it. If you think you want to do it, I’ve set up a demo that demonstrates streaming compressed JSON. It’s hand-rolled but it works.

In the real-world application that inspired these posts, we have opted not to stream the response—yet. It will require some rework of our serialization setup. Consequently, in our app right now if we double the number of records to serialize, our memory usage is not constant. However, from all the other optimizations the growth is quite tame. Only when memory becomes a problem again will we consider whether to implement pagination or streaming.

If you have experience with Rails streaming, let us know! We would love to hear your experiences in the comments.

Profile for Small Wins

If you have made your app as lazy as you are comfortable and you still need to reduce memory usage, you might spend some more time profiling your application for minor hot-spots. A few small gains in many places can add up. Motivated by what your profiler says, you might for example, move constant objects and substantial strings outside of big loops.

# This allocates a lot of objects.
1_000_000.times { "some string or object" }

# This is better.
my_invariant = "some string or object"
1_000_000.times { my_invariant }

In particular, pay attention to strings longer than 23 bytes or a multitude of small arrays containing 4 elements or more. They have to be allocated “off-heap” with malloc, which can contribute to fragmentation.

Conclusion

I’m an idealist when it comes to computers. While myself and others enjoy talking about garbage collection, in an ideal world GC should be something we never think about. Computers should be smart enough to optimize memory usage automatically. MRI lags behind in this department. For example, not all objects need to be allocated on the heap. There’s been a lot of research into escape analysis to keep objects on the stack so they are freed immediately with no GC. Some of this research has made its way into the HotSpot JVM and even more complicated object lifetime analysis is theoretically possible. Ruby’s garbage collector also does not run concurrently—it has to pause your program’s execution to do its work—nor does it minimize fragmentation via compaction or copying. MRI’s memory management could be so much better.

And yet, as much fun as it is to salivate over GC buzzwords, as the above paragraph demonstrates, GC buzzwords are only useful to complain. At some level, I actually dislike GC. Automatic memory management techniques are a means, not an end. They are a means that the computer should be smart enough to figure out by itself. It bothers me that I have to even think about memory usage, much less optimize it and write four blog posts on it. Automatic memory management should so smart that budding programmers never have to ask, “What’s GC?”

However, until the day comes when Ruby is smarter with its memory, we have to be smarter for Ruby.

About Brian Hempel

Brian worked with us long before he came on full-time, and had we seen the baby face lurking beneath his programmer beard, we probably wouldn’t have assumed he was as smart. He proved quickly that he has earned the beard, both as a graduate of Michigan Tech in Bioinformatics and Biochemistry/Molecular Biology, and as an experienced coder who picks up new tools quickly.

An occasional violinist and lover of birds, Brian is a cheerful addition to the office.

Comments

Halil Özgür
February 29, 2016 at 21:20 PM

BTW 1_000_000.times { "some string or object".freeze } is about the same as the invariant version.
The Reverend
October 04, 2017 at 14:44 PM

Ruby’s GC and general memory tuning is a royal headache. Probably the worst of all the alternatives out there. One of the things I keep telling people is to take that into account when choosing the language for a particular module.

One other problem people tend to have is “I’m going to write my module in language XYZ but using the styles and patterns learned from language ABC.” That rarely works out, especially in any other language going to Ruby (except maybe Erlang).

That all being said - I have been able to rewrite microservices in other languages faster than Ruby experts could tune to get even 25% towards their goals. The opposite is true, too. I’ve had times where rewriting in Ruby was faster than tuning the other language. It depended on what the language was good at and what the service was for.

Sometimes, it’s just simply the best option to consider a different language if it’s better at the task at hand. In the cases where it’s not, then this is probably the best blog out there.

By Brian Hempel

March 13, 2015