docker启动时,会在宿主主机上创建一个名为docker0的虚拟网络接口,默认选择172.17.42.1/16,一个16位的子网掩码给容器提供了65534个IP地址。docker0只是一个在绑定到这上面的其他网卡间自动转发数据包的虚拟以太网桥,它可以使容器和主机相互通信,容器与容器间通信。问题是,如何让位于不同主机上的docker容器可以通信。如何有效配置docker网络目前来说还是一个较复杂的工作,因而也涌现了很多的开源项目来解决这个问题,如flannel、Kubernetes、weave、pipework等等。
1. flannel
CoreOS团队出品,是一个基于etcd的覆盖网络(overlay network)并为每台主机提供一个独立子网的服务。Rudder简化了集群中Docker容器的网络配置,避免了多主机上容器子网冲突的问题,更可以大幅度减少端口映射方面的工作。具体代码见https://github.com/coreos/flannel,其工作原理为:
An overlay network is first configured with an IP range and the size of the subnet for each host. For example, one could configure the overlay to use 10.100.0.0/16 and each host to receive a /24 subnet. Host A could then receive 10.100.5.0/24 and host B could get 10.100.18.0/24. flannel uses etcd to maintain a mapping between allocated subnets and real host IP addresses. For the data path, flannel uses UDP to encapsulate IP datagrams to transmit them to the remote host. We chose UDP as the transport protocol for its ease of passing through firewalls. For example, AWS Classic cannot be configured to pass IPoIP or GRE traffic as its security groups only support TCP/UDP/ICMP.(摘自https://coreos.com/blog/introducing-rudder/)
2. Kubernetes
Kubernetes是由Google推出的针对容器管理和编排的开源项目,它让用户能够在跨容器主机集群的情况下轻松地管理、监测、控制容器化应用部署。Kubernete有一个特殊的与SDN非常相似的网络化概念:通过一个服务代理创建一个可以分配给任意数目容器的IP地址,前端的应用程序或使用该服务的用户仅通过这一IP地址调用服务,不需要关心其他的细节。这种代理方案有点SDN的味道,但是它并不是构建在典型的SDN的第2-3层机制之上。
Kubernetes uses a proxying method, whereby a particular service — defined as a query across containers — gets its own IP address. Behind that address could be hordes of containers that all provide the same service — but on the front end, the application or user tapping that service just uses the one IP address.
This means the number of containers running a service can grow or shrink as necessary, and no customer or application tapping the service has to care. Imagine if that service were a mobile network back-end process, for instance; during traffic surges, more containers running the process could be added, and they could be deleted once traffic returned to normal. Discovery of the specific containers running the service is handled in the background, as is the load balancing among those containers. Without the proxying, you could add more containers, but you’d have to tell users and applications about it; Google’s method eliminates that need for configuration. (https://www.sdncentral.com/news/docker-kubernetes-containers-open-sdn-possibilities/2014/07/)
3. 为不同宿主机上所有容器配置相同网段的IP地址,配置方法见http://www.cnblogs.com/feisky/p/4063162.html,这篇文章是基于Linux bridge的,当然也可以用其他的方法,如用OpenvSwitch+GRE建立宿主机之间的连接:
# From http://goldmann.pl/blog/2014/01/21/connecting-docker-containers-on-multiple-hosts/
# Edit this variable: the ‘other‘ host.
REMOTE_IP=188.226.138.185
# Edit this variable: the bridge address on ‘this‘ host.
BRIDGE_ADDRESS=172.16.42.1/24
# Name of the bridge (should match /etc/default/docker).
BRIDGE_NAME=docker0
# bridges
# Deactivate the docker0 bridge
ip link set $BRIDGE_NAME down
# Remove the docker0 bridge
brctl delbr $BRIDGE_NAME
# Delete the Open vSwitch bridge
ovs-vsctl del-br br0
# Add the docker0 bridge
brctl addbr $BRIDGE_NAME
# Set up the IP for the docker0 bridge
ip a add $BRIDGE_ADDRESS dev $BRIDGE_NAME
# Activate the bridge
ip link set $BRIDGE_NAME up
# Add the br0 Open vSwitch bridge
ovs-vsctl add-br br0
# Create the tunnel to the other host and attach it to the
# br0 bridge
ovs-vsctl add-port br0 gre0 -- set interface gre0 type=gre options:remote_ip=$REMOTE_IP
# Add the br0 bridge to docker0 bridge
brctl addif $BRIDGE_NAME br0
# iptables rules
# Enable NAT
iptables -t nat -A POSTROUTING -s 172.16.42.0/24 ! -d 172.16.42.0/24 -j MASQUERADE
# Accept incoming packets for existing connections
iptables -A FORWARD -o docker0 -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
# Accept all non-intercontainer outgoing packets
iptables -A FORWARD -i docker0 ! -o docker0 -j ACCEPT
# By default allow all outgoing traffic
iptables -A FORWARD -i docker0 -o docker0 -j ACCEPT
# Restart Docker daemon to use the new BRIDGE_NAME
service docker restart
4. 使用weave为容器配置IP(使用方法见http://www.cnblogs.com/feisky/p/4093717.html),weave的特性包括
- 应用隔离:不同子网容器之间默认隔离的,即便它们位于同一台物理机上也相互不通;不同物理机之间的容器默认也是隔离的
- 物理机之间容器互通:weave connect $OTHER_HOST
- 动态添加网络:对于不是通过weave启动的容器,可以通过weave attach 10.0.1.1/24 $id来添加网络(detach删除网络)
- 安全性:可以通过weave launch -password wEaVe设置一个密码用于weave peers之间加密通信
- 与宿主机网络通信:weave expose 10.0.1.102/24,这个IP会配在weave网桥上
- 查看weave路由状态:weave ps
- 通过NAT实现外网访问docker容器
5. 修改主机docker默认的虚拟网段,然后在各自主机上分别把对方的docker网段加入到路由表中,配合iptables即可实现docker容器夸主机通信。配置方法如下:
设有两台虚拟机
- v1: 192.168.124.51
- v2: 192.168.124.52
更改虚拟机docker0网段,v1为172.17.1.1/24,v2为172.17.2.1/24
#v1
sudo ifconfig docker0 172.17.1.1 netmask 255.255.255.0sudo bash -c ‘echo DOCKER_OPTS="-B=docker0" >> /etc/default/docker‘ sudo service docker restart # v2 sudo ifconfig docker0 172.17.2.1 netmask 255.255.255.0sudo bash -c ‘echo DOCKER_OPTS="-B=docker0" >> /etc/default/docker‘sudo service docker restart
然后在v1上把v2的docker虚拟网段加入到路由表中,在v2上将v1的docker虚拟网段加入到自己的路由表中
# v1 192.168.124.51 sudo route add -net 172.17.2.0 netmask 255.255.255.0 gw 192.168.124.52 sudo iptables -t nat -F POSTROUTING sudo iptables -t nat -A POSTROUTING -s 172.17.1.0/24 ! -d 172.17.0.0/16 -j MASQUERADE # v2 192.168.124.52 sudo route add -net 172.17.1.0 netmask 255.255.255.0 gw 192.168.124.51 sudo iptables -t nat -F POSTROUTING sudo iptables -t nat -A POSTROUTING -s 172.17.2.0/24 ! -d 172.17.0.0/16 -j MASQUERADE
至此,两台虚拟机中的docker容器可以互相访问了。