Getting to Appreciate the Idioms of Docker
Mon Sep 14 09:28:53 EDT 2020
- Weekend Domino-Apps-in-Docker Experimentation
- Executing a Complicated OSGi-NSF-Surefire-NPM Build With Docker
- Getting to Appreciate the Idioms of Docker
Now that I've been working with Docker more, I'm starting to get used to its way of doing things. As with any complicated tool - especially one as fond of making up its own syntax as Docker is - there's both the process of learning how to do things as well as learning why they're done that way. Since I'm on this journey myself, I figure it could be useful to share what I've learned so far.
What Is Docker?
To start with, it's useful to understand what Docker is both conceptually and technically, since a lot of discussion about it is buried under terms like "cloud native" that obscure the actual topic. That's even before you get to the giant pile of names like "Kubernetes" and "Rancher" that build on top of the core.
Before I get to the technical bits, the overall idea is that Docker is a way to run programs isolated from each other and in a consistent way across deployments. In a Domino context, it's kind of like how an NSF is still its own mostly-consistent app regardless of what OS Domino is on or what version it is - the NSF is its own little world on Domino-the-host. Technically, it diverges wildly from that, but it can be a loose point of reference.
Now, for the nuts and bolts.
Docker (the tool, not the company or service) is a Linux-born toolset for OS-level virtualization. It uses the term "containers", but other systems over time have used terms like "partitions" and "jails" to mean the same thing. In essence, what OS-level virtualization means is that a program or set of programs is put into a box that looks like the whole OS, but is really just a subset view provided by a host OS. This is distinct from virtualization in the sense of VMWare or Parallels in that the app still uses the code of the host OS, rather than loading up a whole additional OS.
Things admittedly get a little muddled on non-Linux systems. Other than Microsoft's peculiar variant of Docker that runs Windows-based apps, "a Docker container" generally means "a Linux container". To accomplish this, and to avoid having a massively-fragmented array of images (more on those in a bit), Docker Desktop on macOS and (usually) Windows uses hardware virtualization to launch a Linux system. In those cases, Docker is using both hardware virtualization and in-OS container virtualization, but the former is just a technical implementation detail. On a Linux host, though, no such second tier is needed.
Beyond making use of this OS service, Docker consists of a suite of tools for building and managing these images and containers, and then other tools (like Kubernetes) operate at a level above that. But all the stuff you deal with with Docker - Dockerfiles, Compose, all that - comes down to creating and managing these walled-off apps.
Docker Images
Docker images are the part that actually contains the programs and data to run and use, which are then loaded up into a container.
A Docker image is conceptually like a disk image used by a virtualization app or macOS - it's a bunch of files ready to be used in a filesystem. You can make your own or - very commonly - pull them from a centralized library like the main Docker Hub. These images are generally components of a larger system, but are sometimes full-on tools to run yourself. For example, the PostgreSQL image is ready to run in your Docker environment and can be used as essentially a quick-start way to set up a Postgres server.
The particular neat trick that Docker images pull is that they're layered. If you look at a Dockerfile (the script used to build these images), you can see that they tend to start with a FROM
line, indicating the base image that they stack on top of. This can go many layers deep - for example, the Maven image builds on top of the OpenJDK image, which is based on the Alpine Linux image.
You can think of this as a usually-simple dependency line in something like Maven. Rather than including all of the third-party code needed, a Maven module will just reference dependencies, which are then brought in and woven together as needed in the final app. This is both useful for creating your images and is also an important efficiency gain down the line.
Dockerfiles
The main way to create a Docker image is to use a Dockerfile, which is a text file with a syntax that appears to have come from another dimension. Still, once you're used to the general form of one, they make sense. If you look at one of the example files, you can see that it's a sequential series of commands describing the steps to create the final image.
When writing these, you more-or-less can conceptualize them like a shell script, where you're copying around files, setting environment properties, and executing commands. Once the whole thing is run, you end up with an image either in your local registry or as a standalone file. That final image is what is loaded and used as the operating environment of the container.
The neat trick that Dockerfiles pull, though, is that commands that modify the image actually create a new layer each, rather than changing the contents of a single image. For example, take these few lines from a Dockerfile I use for building a Domino-based project:
1 2 3 | COPY docker/settings.xml /root/.m2/ RUN mkdir -p /root COPY --from=domino-docker:V1101_03212020prod /opt/hcl/domino/notes/11000100/linux /opt/hcl/domino/notes/latest/linux |
Each of these lines creates a new layer. The first two are tiny: one just contains the settings.xml file from my project and then the second just contains an empty /root
directory. The third is more complicated, pulling in the whole Domino runtime from the official 11.0.1 image, but it's the same idea.
Each of these images is given a SHA-256 hash identifier that will uniquely identify it as a result of an operation on a previous base image state. This lets Docker cache these results and not have to perform the same operation each time. If it knows that, by the time it gets to the third line above, the starting image and the Domino image are both in the same state as they were the last time it ran, it doesn't actually need to copy the bits around: it can just reuse the same unchanged cached layer.
This is the reason why Maven-build Dockerfiles often include a dependency:go-offline
line: because the project's dependencies rarely change, you can create a reusable image from the Maven dependency repository and not have to re-resolve them every build.
Wrap-Up
So that's the core of it: managing images and walled-off mini OS environments. Things get even more complicated in there even before you get to other tooling, but I've found it useful to keep my perspective grounded in those basics while I learn about the other aspects.
In the future, I think I'll talk about how and why Docker has been particularly useful for me when it comes to building and running Domino-based apps, in particularly helping somewhat to alleviate several of the long-standing impediments to working with Domino.