Docker qualifies as a “revolutionary technology” in that containerization tends to overturn the old way of doing things in data centers. The makers of the things that do get overturned, tend not to stand idly by.
Last year, one objection raised by some of the early opponents of containerization (which shall remain nameless, except you’ve heard of the one that begins with “V”) was that the process could expose a potentially severe security hole. And when we raised that objection with Docker Inc. last June, the company’s response was that it wasn’t really a security hole.
Be that as it may, Thursday’s release into general availability of Docker version 1.10 (think of it as “release 10 of version 1”) includes a feature that certainly looks like it’s meant to plug something very big and very ominous.
The new feature, specifically, is support for user namespaces, a recent addition to Linux operating systems. It sounds very esoteric and, as one of my editors has put it, “developer-y.”
But once its explanation is translated into an Earth language, you see exactly what it is and what it’s for.
For any kind of program to run inside a machine, be it virtual or physical, the program needs a map of the file system that supports it, and the resources that are attainable through that file system. A namespace is such a map; it lists what’s available, and where to find it.
The original idea of the “container” in Linux, back before Docker came along, was an isolated namespace. The point was to give certain applications a limited map of the system that was running them.
This way, if a container happened to contained malicious code, the maximum damage it could do would be limited to the items in its namespace. And that space could be limited to only the items required to enable the contained application to run in the first place.
A few years ago, the concept of the container was expanded as a mechanism for deploying portable workloads, first on PaaS platforms (particularly on one called dotCloud) and later on any Linux system. To accomplish this, Docker’s designers enabled workloads inside containers to be hosted not by hypervisors, as is the case with “first-generation” virtual machine systems, but by a daemon managed by the Linux kernel running on the physical platform.
Since the same kernel running the host platform was running the daemon for containers, applications running within those containers tended to have the same privilege levels as the kernel itself. Since the host system user has to be root (the one with the most privilege) in order to run the daemon in the first place, the user of applications inside the container must be root as well.
That meant, the original purpose of subdividing namespaces in the first place had to be set aside, in order to enable containers to be portable.
If this wasn’t a problem, then a senior technical staff member at IBM named Phil Estes — a leading contributor to the Docker Engine project — would not have pointed it out, during a presentation at ContainerCon 2015 last August.
As Estes explained to developers, many of whom were seeing the concept for the first time, a user namespace gives processes running on a Linux kernel an exclusive and specific view of the system. It’s a way to accomplish what the original Linux lxc containers enabled, ironically without creating the subdivision of a container to do it.
“If you don’t set up users in your container,” said Estes at ContainerCon, “even though it has this isolation around it, your process is probably running as root, unless you’ve done something to change the user in your Docker file. And therefore, were there a way for your container to break out of its other containment mechanism, it would be root on the host system, and would have all the privileges of root.”
A Hole By Any Other Name
There’s no evidence that the theoretical capability of an application within a container to “break out” and execute malicious processes on the host with root privilege, was ever exploited. Therefore, the “security hole,” as I have chosen to phrase it, was never officially classified as a vulnerability.
Which is fine, because Docker version 1.10 may patch that hole before the opportunity arises to ever attain that classification.
As a bonus — as Estes first pointed out last August — user namespaces enables security systems to grant per-user privileges and restrictions, and therefore enables orchestration platforms to allow for multi-tenancy.
Put another way, platforms running Docker no longer have to use tags or other tacked-on means to “pretend” as though containers inhabiting a system include processes owned by multiple sources. Docker 1.10 can now map a user namespace with a limited scope of the broad platform, to the root namespace inside the container.
Each container can “believe” it has the whole universe open to it. Meanwhile, its host knows the truth: It can restrict privileges to each container on a need-to-know basis, based upon the profiles of its users.
In short, containers now have real users. The one argument that conventional virtualization proponents still held out that had any merit at all against the viability of container platforms, may have just evaporated.
What Docker still lacks at the moment is a standard way to profile “users” (which, after all, consist moreover of accounts than people). But Docker software engineer Jessie Frazelle started a project last October to produce exactly that, and work towards profiles remains under way.
So here is the takeaway: Once user namespaces are in place, and per-user profiles are active, the only viable way for an application running within a container to do malicious damage outside of itself would be if the API for that application were itself vulnerable.
That’s not impossible, though it’s extremely unlikely.
Title image, "Freight Train in Tuscon, Arizona" by Simeon87, licensed through Creative Commons 3.0.