Intro

This page is an attempt to document the ins and outs of containers on Linux. This is not just restricted to programmers looking to implement containers or use container like features in their own code but also Sysadmins and Users who want to get more of a handle on how containers work 'under the hood'.

If you are a User looking to know more then hit up the FAQ section, if you are a programmer then the Implementations section and Links section are going to be the most useful. It is recommended that Sys admins read up on the Security and Networking part and perhaps take a look at the different Implementations available.

Namespaces

Rather than take an 'All or nothing' approach to containers (eg FreeBSD/Solaris/OpenVZ), native Linux Containers support allows you to unshare Specific resources from the host. These can be mixed and matched in various ways to produce interesting combinations for things such as testing network setups, preventing information leakage (eg for shared hosting webservers) or testing out OS builds (eg from debootstrap). It can even be used to provide a more complete fakeroot replacement

There are also some additional proposed namespaces that are not yet in Linux:

Security

Containers are commonly thought of as a security mechanism, much in the same way that chroot is also mentioned. This is the wrong way to think about containers and will not only lead you astray but also to potential compromise. Namespaces (a part of containers) are an isolation mechanism That can be used to prevent information from one namespace leaking into another inadvertently. They do not however prevent intentional leakage (eg same filesystem mounted in multiple containers at once)

One thing to consider with containers is that the Linux kernel is shared between multiple containers, a namespace ware rootkit that compromise one container will be able to infect other containers and do what it wants to them. Containers will not reduce the attack surface in this case but instead give you multiple instances of that attack surface.

Unfortunately there is no 'One size fits all' security solution when implementing containers and as such you will need to mix and match the features from multiple Security subsystems in order to secure containers against attack. The main ones that when combined cover all bases are listed below:

With these security subsystems in Linux you should be able to implement an overlapping security model that is still resilient should one of these features be unavailable (eg seccomp, selinux and cgroups can all be used to limit what device nodes can be created)

Networking

Networking is one of the easiest ways to get started with namespaces under Linux, the iproute2 command has native support via 'setns' for the 'link' commands and the 'netns' command to create and destroy namespaces. This can be used to spin up environments for testing network topologies of arbitrary complexity or create a process behind a virtual interface for simulating latency

Below is a list of virtual networking features under Linux that can be used in conjunction with containers under Linux. For most simple uses knowing about Virtual Ethernet Interfaces and Bridging should suffice. if you are looking at more advanced networking setups such as those detailed below then you may want to be familiar with all the options listed here

To date most if not all linux container setups have used the VETH + Bridge model details below. While this model suffices for simple uses and scales up fairly well there may be senarios where a slightly diffrent setup provides additinal benifits eg multi tennancy private networks.

Implementations

My Code

Non Linux/Mainline Implementations

Presentations

Unoffical man pages

FAQ

What is a container?

A container is a collection of namespaces mixed in with cgroups, normal Linux networking and normal Linux security mechanisms. It allows you to 'host' multiple instances of userspace (the utilities you use and interact with every day) with only a single kernel in a similar manner to how vhosting on a webserver allows you to host multiple websites.

So whats a namespace then?

A namespace is a specific type of resource that can be split up and partitioned, eg network interfaces, process IDs, user ids, the filesystem. A good example of this that everyone is familiar with is the 'chroot' command which allows you to present a subset of files on the filesystem to a set of processes and hide the other folders from it. All the children of the process you launch with chroot or its namespace equivalent will see exactly the same files/pids/interfaces as the process that was launched in the namespace.

So which ones are useful then?

That depends on what you are doing,

If you are emulating 'old style' visualization and taking a standard Linux install and containerizing it then the correct anwser is 'all of them'.

If you need to emulate multiple nodes to pretend to be a network of machines then the net namespace (and the MOUNT namespace if you need to mount tmpfs on a lock dir for some daemons) should be sufficient and allow you to reuse your existing host filesystem.

If you are bootstraping new installs of a distro, PID and MOUNT namespaces should be sufficient (you dont want to do any UID translation and you want to ensure that shuting down the container does not shut down the host, hence the PID namespace).

If using this as a thin tool to do things like continuous integration against multiple distros without maintaining multiple runners or buildbots then PID namespaces to hide other processes (and prevent errant processes from causing damage) chroot to switch between distro images should be sufficient. if however your processes requires root for testing then UID namespaces may prove to be a handy replacement that means you don't need to run the server process as root.