There are three major areas to consider when reviewing Docker security:


  • the intrinsic security of containers, as implemented by kernel namespaces and cgroups;  由内核中namespace和cgruoups提供的容器的内在安全
  • the attack surface of the Docker daemon itself;     docker程序本身的抗攻击性
  • the "hardening" security features of the kernel and how they interact with containers.  加固内核安全性来影响容器的安全性

Kernel Namespaces

内核 命名空间

Docker containers are very similar to LXC containers, and they come with the similar security features. When you start a container with docker
, behind the scenes Docker creates a set of namespaces and control groups for the container.

docker容器和lxc容器很相似,他们提供的安全特性也差不多。当你用docker run启动一个容器时,在后台docker 为容器创建了一个namespace和contril groups的集合。

Namespaces provide the first and most straightforward form of isolation: processes running within a container cannot see, and even less affect, processes running in another container, or in the host system.


Each container also gets its own network stack, meaning that a container doesn‘t get a privileged access to the sockets or interfaces of another container. Of course, if the host system is setup accordingly, containers can
interact with each other through their respective network interfaces — just like they can interact with external hosts. When you specify public ports for your containers or use links then
IP traffic is allowed between containers. They can ping each other, send/receive UDP packets, and establish TCP connections, but that can be restricted if necessary. From a network architecture point of view, all containers on a given Docker host are sitting
on bridge interfaces. This means that they are just like physical machines connected through a common Ethernet switch; no more, no less.


How mature is the code providing kernel namespaces and private networking? Kernel namespaces were introduced between kernel
version 2.6.15 and 2.6.26
. This means that since July 2008 (date of the 2.6.26 release, now 5 years ago), namespace code has been exercised and scrutinized on a large number of production systems. And there is more: the design and inspiration for the namespaces
code are even older. Namespaces are actually an effort to reimplement the features of OpenVZ in such a way that they could be merged within the
mainstream kernel. And OpenVZ was initially released in 2005, so both the design and the implementation are pretty mature.


Control Groups

Control Groups are the other key component of Linux Containers. They implement resource accounting and limiting. They provide a lot of very useful metrics, but they also help to ensure that each container gets its fair share of memory, CPU, disk I/O; and, more
importantly, that a single container cannot bring the system down by exhausting one of those resources.

Control Groups 是LXC容器的另外一个关键组件,由它来实现资源的审计和限制。他们提供了很多有用的特性,还可以用来确保每个容器可以公平分享主机的内存、cpu、磁盘IO等资源,更重要的是,它可以保证当一个容器耗尽其中一个资源的时候不会连累主机宕机。

So while they do not play a role in preventing one container from accessing or affecting the data and processes of another container, they are essential to fend off some denial-of-service attacks. They are particularly important on multi-tenant platforms, like
public and private PaaS, to guarantee a consistent uptime (and performance) even when some applications start to misbehave.


Control Groups have been around for a while as well: the code was started in 2006, and initially merged in kernel 2.6.24.

Control Groups 始于2006年,从2.6.24之后被引入。

Docker Daemon Attack Surface

Running containers (and applications) with Docker implies running the Docker daemon. This daemon currently requires root privileges,
and you should therefore be aware of some important details.

运行一个容器或则应用程序意味着运行一个docker 服务。docker服务要求root权限,所以你需要了解一些重要的细节。

First of all, only trusted users should be allowed to control your Docker daemon. This is a direct consequence of some powerful Docker features. Specifically, Docker allows you to share a directory between the Docker host
and a guest container; and it allows you to do so without limiting the access rights of the container. This means that you can start a container where the /host directory
will be the /directory on your host;
and the container will be able to alter your host filesystem without any restriction. This sounds crazy? Well, you have to know that all virtualization systems allowing filesystem resource sharing behave the same way. Nothing
prevents you from sharing your root filesystem (or even your root block device) with a virtual machine.


This has a strong security implication: for example, if you instrument Docker from a web server to provision containers through an API, you should be even more careful than usual with parameter checking, to make sure that a malicious user cannot pass crafted
parameters causing Docker to create arbitrary containers.

比如,当你使用一个web api来提供容器创建服务时,要比平常更加注意参数的检查,防止恶意的用户用精心准备的参数来创建带任意参数的容器

For this reason, the REST API endpoint (used by the Docker CLI to communicate with the Docker daemon) changed in Docker 0.5.2, and now uses a UNIX socket instead of a TCP socket bound on (the latter being prone to cross-site-scripting attacks if you
happen to run Docker directly on your local machine, outside of a VM). You can then use traditional UNIX permission checks to limit access to the control socket.

因此,REST API在docker0.5.2之后使用unix socket替代了绑定在127.0.0.1上的tcp socket(后者容易遭受跨站脚本攻击)。现在你可以使用增强的unix sockt权限来限制对控制socket的访问。

You can also expose the REST API over HTTP if you explicitly decide so. However, if you do that, being aware of the above mentioned security implication, you should ensure that it will be reachable only from a trusted network or VPN; or protected with e.g., stunnel and
client SSL certificates. You can also secure them with HTTPS and certificates.

你依然可以将REST API发布到http服务上。不过一定要小心确认这里的安全机制,确保只有可信的网络或则vpn或则受保护的stunnel和ssl认证可以对REST API进行访问。还可以使用https和认证HTTPS
and certificates

Recent improvements in Linux namespaces will soon allow to run full-featured containers without root privileges, thanks to the new user namespace. This is covered in detail here.
Moreover, this will solve the problem caused by sharing filesystems between host and guest, since the user namespace allows users within containers (including the root user) to be mapped to other users in the host system.

最近改进的linux namespace将很快可以实现使用非root用户来运行全功能的容器。这解决了因在容器和主机共享文件系统而引起的安全问题。

The end goal for Docker is therefore to implement two additional security improvements: docker的终极目标是改进2个安全特性

  • map the root user of a container to a non-root user of the Docker host, to mitigate the effects of a container-to-host privilege escalation;将root用户的容器映射到主机上的非root用户,减轻容器和主机之间因权限提升而引起的安全问题
  • allow the Docker daemon to run without root privileges, and delegate operations requiring those privileges to well-audited sub-processes, each with its own (very limited) scope: virtual network setup, filesystem
    management, etc.允许docker服务在非root权限下运行,委派操作请求到那些经过良好审计的子进程,每个子进程拥有非常有限的权限:虚拟网络设定,文件系统管理、配置等等。

Finally, if you run Docker on a server, it is recommended to run  Docker in the server, and move all other services within containers controlled by Docker. Of course, it is fine to keep your favorite admin tools (probably at least an SSH server), as well as
existing monitoring/supervision processes (e.g., NRPE, collectd, etc).

最后,如果你在一个服务器上运行docker,建议去掉docker之外的其他服务,除了一些管理服务比如ssh 监控和进程管理工具nrpe clllectd等等。

Linux Kernel Capabilities

By default, Docker starts containers with a very restricted set of capabilities. What does that mean?


Capabilities turn the binary "root/non-root" dichotomy into a fine-grained access control system. Processes (like web servers) that just need to bind on a port below 1024 do not have to run as root: they can just be granted the net_bind_service capability
instead. And there are many other capabilities, for almost all the specific areas where root privileges are usually needed.

这是一个root或非root 二分法粒度管理的访问控制系统。比如web服务进程只需要绑定一个低于1024的端口,不需要用root来允许:那么它只需要给它授权net_bind_service功能就可以了。还有很多其他的capabilities,几乎所有需要root权限的仅需要指定一个部分capabilities就可以了。

This means a lot for container security; let‘s see why!


Your average server (bare metal or virtual machine) needs to run a bunch of processes as root. Those typically include SSH, cron, syslogd; hardware management tools (e.g., load modules), network configuration tools (e.g., to handle DHCP, WPA, or VPNs), and
much more. A container is very different, because almost all of those tasks are handled by the infrastructure around the container:

通常的服务器需要允许一大堆root进程,通常有ssh cron syslogd;模块和网络配置工具等等。容器则不同,因为大部分这种人物都被容器外面的基础设施处理了。

  • SSH access will typically be managed by a single server running in the Docker host; ssh可以被主机上ssh服务替代
  • cron,
    when necessary, should run as a user process, dedicated and tailored for the app that needs its scheduling service, rather than as a platform-wide facility;
  • log management will also typically be handed to Docker, or by third-party services like Loggly or Splunk;
  • hardware management is irrelevant, meaning that you never need to run udevd or
    equivalent daemons within containers;硬件管理也无关紧要,容器中也就无需执行udevd或则其他类似的服务
  • network management happens outside of the containers, enforcing separation of concerns as much as possible, meaning that a container should never need to perform ifconfigroute,
    or ip commands (except when a container is specifically engineered to behave like a router or firewall, of course).网络管理也都在主机上设置,除非特殊需求,ifconfig、route、ip也不需要了。

This means that in most cases, containers will not need "real" root privileges at all. And therefore, containers can run with a reduced capability set; meaning that "root" within a container has much less privileges than
the real "root". For instance, it is possible to:


  • deny all "mount" operations; 完全禁止任何mount操作
  • deny access to raw sockets (to prevent packet spoofing);      禁止访问络socket
  • deny access to some filesystem operations, like creating new device nodes, changing the owner of files, or altering attributes (including the immutable flag);禁止访问一些文件系统的操作,比如创建新的设备node等等
  • deny module loading; 禁止模块加载
  • and many others.还有一些其他的

This means that even if an intruder manages to escalate to root within a container, it will be much harder to do serious damage, or to escalate to the host.


This won‘t affect regular web apps; but malicious users will find that the arsenal at their disposal has shrunk considerably! By default Docker drops all capabilities except those
, a whitelist instead of a blacklist approach. You can see a full list of available capabilities in Linux manpages.

这不会影响普通的web apps,恶意的用户会想各种办法来对你!默认情况下,docker丢弃了它需要的功能之外的其余部分,白名单和黑名单,在 Linux

Of course, you can always enable extra capabilities if you really need them (for instance, if you want to use a FUSE-based filesystem), but by default, Docker containers use only a whitelist of
kernel capabilities by default.


Other Kernel Security Features


Capabilities are just one of the many security features provided by modern Linux kernels. It is also possible to leverage existing, well-known systems like TOMOYO, AppArmor, SELinux, GRSEC, etc. with Docker.

AppArmor, SELinux, GRSEC来增强安全性。

While Docker currently only enables capabilities, it doesn‘t interfere with the other systems. This means that there are many different ways to harden a Docker host. Here are a few examples.


  • You can run a kernel with GRSEC and PAX. This will add many safety checks, both at compile-time and run-time; it will also defeat many exploits, thanks to techniques like address randomization. It doesn‘t require
    Docker-specific configuration, since those security features apply system-wide, independently of containers.你可以在内核中加载GRSEC和PAX,这会增加很多安全检查。
  • If your distribution comes with security model templates for Docker containers, you can use them out of the box. For instance, we ship a template that works with AppArmor and Red Hat comes with SELinux policies
    for Docker. These templates provide an extra safety net (even though it overlaps greatly with capabilities).你可以使用一些有增强安全特性的发行版的模板,比如带apparmor的模板和redhat系列带selinux dcoker策略,这些模板提供了额外的安全特性。
  • You can define your own policies using your favorite access control mechanism.使用你自己喜欢的访问控制机制来定义你自己的安全策略。

Just like there are many third-party tools to augment Docker containers with e.g., special network topologies or shared filesystems, you can expect to see tools to harden existing Docker containers without affecting Docker‘s core.




Docker containers are, by default, quite secure; especially if you take care of running your processes inside the containers as non-privileged users (i.e. non-root).


You can add an extra layer of safety by enabling Apparmor, SELinux, GRSEC, or your favorite hardening solution.

你还可以添加额外的比如Apparmor, SELinux, GRSEC等你熟悉的加固反感

Last but not least, if you see interesting security features in other containerization systems, you will be able to implement them as well with Docker, since everything is provided by the kernel anyway.


For more context and especially for comparisons with VMs and other container systems, please also see theoriginal
blog post

更多与vm和其他容器系统的比较的详细内容,请看original blog post.

