How I learned to stop worrying and love Dockerfiles
Co-Founder, CTO
In this Monolith Renaissance that everyone is re-experiencing, it may outright shock you to discover that Yetto runs a bunch of microservices.
Well, sort of. Our main app, which powers yetto.app, is a classic Rails monolith; but all of the plug integrations which we manage (for Slack, GitHub, Zendesk, and other services) are separate Rails monoliths, making them collectively a group of microservices we manage. Make sense? Doesn't matter. The point is, each of our plugs are isolated from one another, and up until recently, had their own separate code bases. But over the years we've spent building out our plugs, we didn't think twice about how the projects were organized.
It turned out that about 80% of the logic was shared between the plugs—things like communicating with the Yetto API, or serving settings pages. The unique parts were all in how a plug communicated with its platform: which APIs to call, how to authenticate the Yetto user, and so on. (As discussed previously, these sorts of realizations only occur with time.) Towards the end of last month, I ripped all of those commonalities out—all the initializers, middleware, client services, and even the Puma config—and tucked them all inside a Rails Engine. This allows the plugs to all use the same generic logic and dependencies. Fantastic!
But there was one part of this abstraction process I was intimidated by. See, all the plugs' app logic was tucked away, but all the plugs' operational logic was still not centralized. And in most cases, these operational systems—the Dockerfile, the Fly.toml config, our web server—were the exact same in every plug.
Docker Docker everywhere and not a drop to drink
My understanding of Docker was limited to equating it to a giant tarball. You put in an operating system, language frameworks, your app code, and ta-da, you've got a runnable package. And so, each of our plugs had the exact same Dockerfile, installing the exact same Debian version, loading the exact same Ruby version, and copying their unique code bases to produce the final images. Over the last few years, there have been times where we've needed to update our Ruby version (to support YJIT) and our base operating system (to bookworm)—meaning that, although these changes are rare, they do happen. Changing them often involves a giant find-replace across multiple directories, and an outrageous amount of repetitive clicking to open the resulting PRs. Worse still, if we want to do an optimization like incorporate jemalloc or precompile assets in a different way, we'd have to make the same changes over and over.
While looking for ways to streamline our operational structure, I came across this blog post discussing the ON BUILD
directive. According to Docker's documentation:
The
ONBUILD
instruction adds to the image a trigger instruction to be executed at a later time, when the image is used as the base for another build...This is useful if you are building an image which will be used as a base to build other images, for example an application build environment or a daemon which may be customized with user-specific configuration.
Great! Docker has a way to allow for things to happen "later," i.e., at a final build time. To understand this, know that most Dockerfiles start by deriving off of a base image:
FROM docker.io/ruby:3.3-slim-bookworm as ruby
And, there's nothing stopping one from deriving from multiple images, to implement multi-stage builds:
# Ruby image to use for base image
FROM docker.io/ruby:3.3-slim-bookworm as ruby
# Node image to use for base images
FROM docker.io/node:22-bookworm-slim as node
When you run docker build
on this Dockerfile, you'll have a base with both Ruby and Node installed.
The next step in a Dockerfile is to bring in your application code, build your dependencies, and set up the environment. The instructions to run this process consist of the exact same sequence of steps in every Rails project; the only difference is that the content of the files changes. To put it another way, every Rails project needs to run bundle install
to fetch dependencies, but every Rails project has a different set of gems to install (as defined in their respective Gemfile
s.) We don't care about what is being installed, or which assets need to be precompiled via rake assets:precompile
, only that it gets done.
Going back to the contrived tarball equivalency, what we want to do is to say: "Hey, set Ruby up, but only copy in and build the dependencies later." And that's exactly what ONBUILD
lets you do.
Are we there yet?
Before explaining exactly how ONBUILD
works, let's first take a look at what the typical plug's Dockerfile looks like:
# syntax = docker/dockerfile:1
# Make sure RUBY_VERSION matches the Ruby version in .ruby-version and Gemfile
ARG RUBY_VERSION=3.3.0
FROM ruby:$RUBY_VERSION-slim as base
# Rails app lives here
WORKDIR /plug-sample
# Set environment
ARG RAILS_ENV
ENV RAILS_ENV=${RAILS_ENV:-development}
ENV RAILS_ENV=${RAILS_ENV} \
BUNDLE_WITHOUT="staging:development:test" \
BUNDLE_DEPLOYMENT="1"
# Update gems and bundler
RUN gem update --system --no-document && \
gem install -N bundler
# Throw-away build stages to reduce size of final image
FROM base as prebuild
# Install packages needed to build gems
RUN --mount=type=cache,id=dev-apt-cache,sharing=locked,target=/var/cache/apt \
--mount=type=cache,id=dev-apt-lib,sharing=locked,target=/var/lib/apt \
apt-get update -qq && \
apt-get install --no-install-recommends -y build-essential curl git libpq-dev libvips pkg-config ca-certificates iptables iproute2
FROM prebuild as build
# Install application gems
COPY --link Gemfile Gemfile.lock .ruby-version ./
RUN --mount=type=cache,id=bld-gem-cache,sharing=locked,target=/srv/vendor \
bundle config set app_config .bundle && \
bundle config set path /srv/vendor && \
bundle install && \
bundle exec bootsnap precompile --gemfile && \
bundle clean && \
mkdir -p vendor && \
bundle config set path vendor && \
cp -ar /srv/vendor .
# Copy application code
COPY --link . .
# Precompile bootsnap code for faster boot times
RUN bundle exec bootsnap precompile app/ lib/
# Final stage for app image
FROM base
# Install packages needed for deployment
RUN --mount=type=cache,id=dev-apt-cache,sharing=locked,target=/var/cache/apt \
--mount=type=cache,id=dev-apt-lib,sharing=locked,target=/var/lib/apt \
apt-get update -qq && \
apt-get install --no-install-recommends -y sudo git
# Copy built artifacts: gems, application
COPY --from=build /usr/local/bundle /usr/local/bundle
COPY --from=build /plug-sample /plug-sample
# Entrypoint sets up the container.
ENTRYPOINT ["/plug-sample/bin/docker-entrypoint"]
EXPOSE 3000
All of this code is the same sequence of steps for each plug, it just so happens that the plug's directory changes, based on the name of the plug (in this case, sample
).
The ideal Dockerfile we're aiming for looks more like this:
# Base Rails Image
FROM ghcr.io/yettoapp/base-rails:main AS base
# App Plug Image
FROM ghcr.io/yettoapp/app-plug:main
USER app
Each of these FROM
instructions points to a Dockerfile that contains several ONBUILD
instructions, and each one is responsible for two different parts of the application startup process:
base-rails
is responsible for installing dependencies, setting up environmentapp-plug
is responsible for setting up the environment,chown
ing directories, installing binaries like Tailscale and 1Password, and just generally making sure everything is in an understood state, all beforerails server
is called
Showing the entirety of these Dockerfiles would be repetitive, since they're not terribly different from what's shown above. Still, here's a snippet which shows a very important set of instructions:
ONBUILD RUN \
# Mount Ruby Gem caches
--mount=type=cache,id=gem-cache-${RAILS_ENV},target=/usr/local/bundle/cache/,sharing=locked \
# Download and install required Gems
bundle install -j"$(nproc)"; \
# Precompile gems with Bootsnap (and ignore errors)
bundle exec bootsnap precompile --gemfile || true && \
# Remove unneeded gems
bundle clean --force;
# Copy the whole application folder into the image
ONBUILD COPY . /app
At their core, these instructions build the dependencies necessary for the plug, then copy the entire application into a folder called /app
. But because of the ONBUILD
command, these instructions don't execute until we call FROM ghcr.io/yettoapp/base-rails:main AS base
. That way, we can provide a base image our plugs can derive from; think of this as a hook into the docker build
lifecycle.
Now, app-plug
can copy those prebuilt dependencies and pre-loaded app code into its own image:
# Copy app with gems from former build stage
ONBUILD COPY --from=base /usr/local/bundle/ /usr/local/bundle/
ONBUILD COPY --from=base /app /app
# run docker-entrypoint as defined in the plug's repository
ONBUILD ENTRYPOINT ["/plug-sample/bin/docker-entrypoint"]
This sort of matryoshka nesting of Dockerfiles means that we can have any number of Dockerfiles dependent on another via ONBUILD
.
Open sourcing our work
We've been gradually refactoring and consolidating our plug code and infrastructure as we built it. About a month ago, this update stabilized the use of ONBUILD
, which gave us the opportunity to overhaul our Dockerops. Not only that, but we've decided to open source our Dockerfiles so that others can make use of them.
In addition, we also have a GitHub Action that allows us to generate our Docker images dynamically whenever a Dockerfile changes:
Where next?
You can use our Dockerfiles as a base for setting up your own Rails microservices. Yetto's are pretty opinionated, so you may want to start with something more generic, like docker-rails-base.
This infrastructure work lets us build plugs faster and keep them more secure and stable. Which means we can ship more integrations and features more quickly over the next year. Want to know what that looks like in action? Sign up for an account and see what all the fuss is about!