题记
对于Docker容器集群来说,比较成熟的方案包括Swarm、Mesos、k8s和Google的Kubernetes,特别是后者得到了更多厂商的使用和推广,但是kubernetes相比较来说技术门槛较高,让很多用户望而却步,幸好,Docker在今年的6月7号开源发布了原生的集群管理工具SwarmKit,主要提供容器集群以及编排能力,那赶紧尝试一下,看看有什么好的功能。
SwarmKit框架
SwarmKit中有两种角色,Manager和Worker。Manager主要管理节点、调度任务。Worker主要通过Executor来执行任务,当前缺省的Executor为Docker Container Executor。包含了一下特点:
(1)内建分布式存储,不要额外的数据库
(2)支持Rolling update
(3)容器HA,支持Zero applicaton downtime
(4)通过TLS保证通讯安全
接下来就赶紧安装一下。
“ SwarmKit 环境部署”
部署环境
VMWare Workstation 12
Ubuntu 14.04
先制作一个通用的Ubuntu14.04虚拟机,然后安装Docker、安装Swarmkit包,然后进行虚拟机复制,生成三台集群环境,分别为一个Manager和两个Worker。
1、安装Ubuntu 14.04
2、安装Docker
curl -sSL https://get.docker.io | bash
3、由于SwarmKit采用Go语言开发,所以需要部署一个go环境,下载go包
下载地址:http://www.golangtc.com/download
我这里下载的是go1.6.2.linux-amd64.tar.gz
4、解压缩到/usr/local下面即可
然后添加环境变量 至 /etc/profile
export GOROOT=/usr/local/go export PATH=/root/go/src/github.com/docker/swarmkit/bin:/usr/local/go/bin:$PATH export GOPATH=/root/go export SWARM_SOCKET=/tmp/controller/swarm.sock
注意:我目前的环境变量包含了所有的
环境变量生效之后,直接输入go即可看到命令提示。
5、安装git包
apt-get install git
6、输入如下命令下载swarmkit包
$ go get github.com/docker/swarmkit
注意:需要配置GOPATH环境变量,如上所述:/root/go,相关代码会下载到该目录下
7、进入/root/go/src/github.com/docker/swarmkit目录,make即可
[email protected]:~/go/src/github.com/docker/swarmkit# pwd /root/go/src/github.com/docker/swarmkit [email protected]:~/go/src/github.com/docker/swarmkit# ls agent bin ca cmd CONTRIBUTING.md Godeps ioutils log Makefile NOMENCLATURE.md protobuf vendor api BUILDING.md circle.yml codecov.yml doc.go identity LICENSE MAINTAINERS manager picker README.md version
8、在该目录的bin文件夹下可以看到生成的二进制文件
[email protected]:~/go/src/github.com/docker/swarmkit/bin# ll total 76092 drwxr-xr-x 2 root root 4096 Jun 30 10:00 ./ drwxr-xr-x 17 root root 4096 Jun 30 10:00 ../ -rwxr-xr-x 1 root root 12929120 Jun 30 10:00 protoc-gen-gogoswarm* -rwxr-xr-x 1 root root 18044776 Jun 30 10:00 swarm-bench* -rwxr-xr-x 1 root root 19219800 Jun 30 10:00 swarmctl* -rwxr-xr-x 1 root root 27708144 Jun 30 10:00 swarmd*
swarmd是一个swarmkit daemon程序,用来运行manager和worker。
swarmctl是一个命令行工具,用来访问swarm manger
我们需要将这四个文件分别拷贝到不同manage和worker节点的/usr/bin里面,当然,也可以放在上述的环境变量里面。
9、建议在该机器下载一个测试镜像。
10、关闭虚拟机,复制。
“ SwarmKit 具体使用 ”
目前我已经准备好了三台虚拟机,里面的环境都是一样的。
Manager:192.168.14.244/controller
Worker:192.168.14.80/worker1
Worker:192.168.14.223/worker2
确保manager和所有worker都可以实现ssh无密码访问
1、在manager节点执行启动命令swarmd -d /tmp/controller --listen-control-api /tmp/controller/swarm.sock --hostname controller
[email protected]:~# swarmd -d /tmp/controller --listen-control-api /tmp/controller/swarm.sock --hostname controller Warning: Specifying a valid address with --listen-remote-api may be necessary for other managers to reach this one. INFO[0000] Listening for connections addr=[::]:4242 proto=tcp INFO[0000] Listening for local connections addr=/tmp/controller/swarm.sock proto=unix INFO[0000] 2dacc8ed9d604bff became follower at term 0 INFO[0000] newRaft 2dacc8ed9d604bff [peers: [], term: 0, commit: 0, applied: 0, lastindex: 0, lastterm: 0] INFO[0000] 2dacc8ed9d604bff became follower at term 1 INFO[0000] 2dacc8ed9d604bff is starting a new election at term 1 INFO[0000] 2dacc8ed9d604bff became candidate at term 2 INFO[0000] 2dacc8ed9d604bff received vote from 2dacc8ed9d604bff at term 2 INFO[0000] 2dacc8ed9d604bff became leader at term 2 INFO[0000] raft.node: 2dacc8ed9d604bff elected leader 2dacc8ed9d604bff at term 2 INFO[0000] node is ready
注意:你可以使用nohup命令启动后台运行
然后分别在worker机器上运行加入集群的命令
swarmd -d /tmp/work1 --hostname work1 --join-addr 192.168.13.244:4242
[email protected]:~# swarmd -d /tmp/work1 --hostname work1 --join-addr 192.168.13.244:4242 Warning: Specifying a valid address with --listen-remote-api may be necessary for other managers to reach this one. INFO[0000] Waiting for TLS certificate to be issued... INFO[0000] Downloaded new TLS credentials with role: swarm-worker. INFO[0000] node is ready
swarmd -d /tmp/work2 --hostname work2 --join-addr 192.168.13.244:4242
[email protected]:~# swarmd -d /tmp/work2 --hostname work2 --join-addr 192.168.13.244:4242 Warning: Specifying a valid address with --listen-remote-api may be necessary for other managers to reach this one. INFO[0000] Waiting for TLS certificate to be issued... INFO[0000] Downloaded new TLS credentials with role: swarm-worker. INFO[0000] node is ready
这样三台机器都加入了统一的SwarmKit集群管理中了。接下来我们就可以通过swarmctl命令对docker集群进行管理了,这里面需要添加SWARM_SOCKET环境变量,参看上述。
2、查看目前的集群节点情况
[email protected]:~# swarmctl node ls ID Name Membership Status Availability Manager Status -- ---- ---------- ------ ------------ -------------- 5a9hk2li71cx0fuzgi42wsrvc work1 ACCEPTED READY ACTIVE a84vwflvs2ao4awtxor4ybaa6 work2 ACCEPTED READY ACTIVE azwpol92a8mrk1mssqw2ry7p7 controller ACCEPTED READY ACTIVE REACHABLE *
可以看到目前所有的状态都是正常状态
3、创建一个ubuntu服务
[email protected]:~# swarmctl service create --name ubuntu --image ubuntu:14.04 1601akon3w1t1m7i9el4pn9gl
注意:如果集群各个节点包含镜像,启动比较快,如果不包含,各个节点如果联网,自行去docker hub里面下载镜像,这个可能会花费时间。这也是我建议为什么先下载一个镜像的原因。
查看目前的服务情况
[email protected]:~# swarmctl service ls ID Name Image Replicas -- ---- ----- -------- 1601akon3w1t1m7i9el4pn9gl ubuntu ubuntu:14.04 0/1
通过inspect查看服务的详细情况。
[email protected]:~# swarmctl service inspect ubuntu ID : 1601akon3w1t1m7i9el4pn9gl Name : ubuntu Replicas : 0/1 Template Container Image : ubuntu:14.04 Task ID Service Slot Image Desired State Last State Node ------- ------- ---- ----- ------------- ---------- ---- 9qflavdz3okfqtael5558o8ey ubuntu 1 ubuntu:14.04 RUNNING PREPARING 34 seconds ago controller
Last State是PREPARING,说明controller还没有启动ubuntu容器,因为本地还没有镜像,需要从镜像仓库拉取。
4、更新服务
SwarmKit提供的swarmctl service update功能可以对已有的服务的信息进行更新,例如镜像版本,实例参数(CPU、内存、端口号、网络、卷...)、副本数、标签、环境变量等进行更新
[email protected]:~# swarmctl service update ubuntu Error: no changes detected Usage: swarmctl service update <service ID> [flags] Flags: --args value container args (default []) --bind value define a bind mount (default []) --command value override entrypoint (default []) --constraint value Placement constraint (node.labels.key==value) (default []) --cpu-limit string CPU cores limit (e.g. 0.5) --cpu-reservation string number of CPU cores reserved (e.g. 0.5) --env value container env (default []) --image string container image --label value service label (key=value) (default []) --memory-limit string memory limit (e.g. 512m) --memory-reservation string amount of reserved memory (e.g. 512m) --name string service name --network string network name --ports value ports (default []) --replicas uint number of replicas for the service (only works in replicated service mode) (default 1) --restart-condition string condition to restart the task (any, failure, none) (default "any") --restart-delay string delay between task restarts (default "5s") --restart-max-attempts uint maximum number of restart attempts (0 = unlimited) --restart-window string time window to evaluate restart attempts (0 = unbound) (default "0s") --update-delay string delay between task updates (0s = none) (default "0s") --update-parallelism uint task update parallelism (0 = all at once) --volume value define a volume mount (default [])
5、节点管理
[email protected]:~# swarmctl service inspect ubuntu ID : 1601akon3w1t1m7i9el4pn9gl Name : ubuntu Replicas : 0/3 Template Container Image : ubuntu:14.04 Task ID Service Slot Image Desired State Last State Node ------- ------- ---- ----- ------------- ---------- ---- e4xaodj9v2sest4ix141wgu4t ubuntu 1 ubuntu:14.04 ACCEPTED ACCEPTED now work2 0yfsmmpvq51clsywnfjn077va ubuntu 2 ubuntu:14.04 RUNNING PREPARING 17 minutes ago work1 706418wj5je0h7jtirvqmhlfi ubuntu 3 ubuntu:14.04 ACCEPTED ACCEPTED now controller
我们可以通过swarmctl node drain work2命令将work2设置为不可用状态
(如果希望激活,使用swarmctl node activate work2命令即可)
同时我们可以查看到状态改变
[email protected]:~# swarmctl node drain work2 [email protected]:~# swarmctl service inspect ubuntu ID : 1601akon3w1t1m7i9el4pn9gl Name : ubuntu Replicas : 0/3 Template Container Image : ubuntu:14.04 Task ID Service Slot Image Desired State Last State Node ------- ------- ---- ----- ------------- ---------- ---- 2znvxhamkbvj2eqf5q3xxin7d ubuntu 1 ubuntu:14.04 RUNNING PREPARING 6 seconds ago controller 0yfsmmpvq51clsywnfjn077va ubuntu 2 ubuntu:14.04 RUNNING PREPARING 19 minutes ago work1 890be0l7eapjep5v8k70b2pto ubuntu 3 ubuntu:14.04 ACCEPTED ACCEPTED 2 seconds ago controller
我们可以看到
任务已经迁移到controller和work1上了。
后面可以看到controller的实例以ubuntu.3.xxxxx;work1的实例以ubuntu.2.xxxxx;work2的实例名以ubuntu.1.xxxxx,这是因为我执行了(swarmctl service update redis --replicas 3)副本操作。
我们看到在controller里面已经可以看到原来work2的实例了
[email protected]:~# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a159185201f8 ubuntu:14.04 "/bin/bash" 5 seconds ago Exited (0) 4 seconds ago ubuntu.1.58zd7f87vldcici33i272v20o 0507647074e1 ubuntu:14.04 "/bin/bash" 8 seconds ago Exited (0) 6 seconds ago ubuntu.3.dwbyeoyk9bgu07u2wz3utoput 50a5aea24ed2 ubuntu:14.04 "/bin/bash" 17 seconds ago Exited (0) 16 seconds ago ubuntu.1.9dwu5e3tb9cte2w2fnp7lyd1r f8244e6e4d55 ubuntu:14.04 "/bin/bash" 23 seconds ago Exited (0) 22 seconds ago ubuntu.3.0dipyu39lyhvqtc68bq5tfld2 7019fbed41f1 ubuntu:14.04 "/bin/bash" 31 seconds ago Exited (0) 29 seconds ago ubuntu.1.2kjo445p3l2fucgo82e06gy1e a63eac32c424 ubuntu:14.04 "/bin/bash" 33 seconds ago Exited (0) 31 seconds ago ubuntu.3.anr48rmhqyh98v2xmwvafb60x 928882228a57 ubuntu:14.04 "/bin/bash" 46 seconds ago Exited (0) 45 seconds ago ubuntu.3.bymam97t2g6libynguczc6lm8 6ec7ad7a6bd2 ubuntu:14.04 "/bin/bash" 48 seconds ago Exited (0) 47 seconds ago ubuntu.1.ct3fc130hki4keq6ax0uq0at6 98e9728e9000 ubuntu:14.04 "/bin/bash" 56 seconds ago Exited (0) 55 seconds ago ubuntu.3.arjd0dtjefhofd5xnxw48loli bcf0708aae66 ubuntu:14.04 "/bin/bash" 59 seconds ago Exited (0) 58 seconds ago ubuntu.1.augk3ferw6tvwe8qs6fk3wtt2 57db27be5bbd ubuntu:14.04 "/bin/bash" About a minute ago Exited (0) About a minute ago ubuntu.3.6tkx22dhta5gtmkbpk83osj4k 6fdf69e49e98 ubuntu:14.04 "/bin/bash" About a minute ago Exited (0) About a minute ago ubuntu.1.4qamwbaezfp3iigo0kkwzh5my 3715774ddda2 ubuntu:14.04 "/bin/bash" About a minute ago Exited (0) About a minute ago ubuntu.3.5h2ohspdob81pueb74xmrr9q9 22071586b2a0 ubuntu:14.04 "/bin/bash" About a minute ago Exited (0) About a minute ago ubuntu.1.f4lmuk63l20kp165qe6uovo1n 797aa652b3da ubuntu:14.04 "/bin/bash" About a minute ago Exited (0) About a minute ago ubuntu.3.2pyygimy8ilaub70gltsa96d0 22143a2f7795 ubuntu:14.04 "/bin/bash" About a minute ago Exited (0) About a minute ago ubuntu.1.dvrjxce69qa382otiw225gmac 1a26f50f87fa ubuntu:14.04 "/bin/bash" About a minute ago Exited (0) About a minute ago ubuntu.3.byb6g2iuehmvgif7ss0nwnk0d f43232d8a05a ubuntu:14.04 "/bin/bash" 2 minutes ago Exited (0) About a minute ago ubuntu.1.8zucjezgh65dv58jckgt0ycjj
后面我们将work2激活后,看到work2的实例又启用了
[email protected]:~# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 9424a0b3dd72 ubuntu:14.04 "/bin/bash" 10 seconds ago Exited (0) 9 seconds ago ubuntu.1.ezptv6gxoalsqdawl9uz2anwk 6dd10817d0cf ubuntu:14.04 "/bin/bash" 20 seconds ago Exited (0) 19 seconds ago ubuntu.1.8lnrs64khjyqj69cnt03o20cs a71f20118aba ubuntu:14.04 "/bin/bash" 32 seconds ago Exited (0) 31 seconds ago ubuntu.1.2i81636zz5vpkzjxs0emswmys ef939614adcc ubuntu:14.04 "/bin/bash" 42 seconds ago Exited (0) 41 seconds ago ubuntu.1.0b66k4exrehqc1zatoalmvfki e06a3e77986f ubuntu:14.04 "/bin/bash" 52 seconds ago Exited (0) 52 seconds ago ubuntu.1.41gzjjru89aym5yxx56f1li77 c9a5e2ecacfd ubuntu:14.04 "/bin/bash" About a minute ago Exited (0) About a minute ago ubuntu.1.5hwusgfhb20fkzo1nloragkia 0785dd33cdf1 ubuntu:14.04 "/bin/bash" About a minute ago Exited (0) About a minute ago ubuntu.1.71qca8ze2rpb94p3zv86y4u0e 6fc4a1a657aa ubuntu:14.04 "/bin/bash" About a minute ago Exited (0) About a minute ago ubuntu.1.0vujqtkoz7bl3rtui2kpexjhm b2c046e8515c ubuntu:14.04 "/bin/bash" About a minute ago Exited (0) About a minute ago ubuntu.1.54elpfbr77jaq420mar5hn3vi
结语
目前SwarmKit由于刚开源,肯定有大量的问题存在,而且都还不能进入生产系统,但是作为直接集成到Docker Engine的优势比较明显,我个人在使用过程中还发现了不少问题。(问题可能是我理解不同)
例如:为什么我频繁刷新某个机器的docker实例,相关的Contrainer_ID并不一样,频繁变化,如果我全部启动后,contrainer_id固定下来了,但是类似上述的迁移还发生了变化。
另外,我理解的集群管理应该是我创建一个镜像的服务,应该根据内部的调度算法,创建某个容器实例在某个集群节点上,(上述看到的很多,是因为我更新了服务副本为3,执行了swarmctl service update redis --replicas 3),
但是为什么每个集群节点都包含9个容器实例,难道是根据我的虚拟机配置2vCPU+2vG,直接占满资源么?
后面还需要对该工具进一步研究,不过我还是比较看好该工具在容器集群的发展前景的。