关于作者 刘海平(HappyLau )云计算高级顾问 目前在腾讯云从事公有云相关工作,曾就职于酷狗,EasyStack,拥有多年公有云+私有云计算架构设计,运维,交付相关经验,参与了酷狗,南方电网,国泰君安等大型私有云平台建设,精通Linux,Kubernetes,OpenStack,Ceph等开源技术,在云计算领域具有丰富实战经验,拥有RHCA/OpenStack/Linux授课经验。
写在前面
上一篇文章中kubernetes系列教程(六)kubernetes资源管理和服务质量初步介绍了kubernetes中的resource资源调度和服务质量Qos,介绍了kubernetes中如何定义pod的资源和资源调度,以及设置resource之后的优先级别Qos,接下来介绍kubernetes系列教程pod的调度机制。
1. Pod调度
1.1 pod调度概述
kubernets是容器编排引擎,其中最主要的一个功能是容器的调度,通过kube-scheduler实现容器的完全自动化调度,调度周期分为:调度周期Scheduling Cycle和绑定周期Binding Cycle,其中调度周期细分为过滤filter和weight称重,按照指定的调度策略将满足运行pod节点的node赛选出来,然后进行排序;绑定周期是经过kube-scheduler调度优选的pod后,由特定的node节点watch然后通过kubelet运行。
?
过滤阶段包含预选Predicate和scoring排序,预选是筛选满足条件的node,排序是最满足条件的node打分并排序,预选的算法包含有:
- CheckNodeConditionPred 节点是否ready
- MemoryPressure 节点内存是否压力大(内存是否足够)
- DiskPressure 节点磁盘压力是否大(空间是否足够)
- PIDPressure 节点Pid是否有压力(Pid进程是否足够)
- GeneralPred 匹配pod.spec.hostname字段
- MatchNodeSelector 匹配pod.spec.nodeSelector标签
- PodFitsResources 判断resource定义的资源是否满足
- PodToleratesNodeTaints 能容忍的污点pod.spec.tolerations
- CheckNodeLabelPresence
- CheckServiceAffinity
- CheckVolumeBinding
- NoVolumeZoneConflict
过滤条件需要检查node上满足的条件,可以通过kubectl describe node node-id方式查看,如下图:
优选调度算法有:
- least_requested 资源消耗最小的节点
- balanced_resource_allocation 各项资源消耗最均匀的节点
- node_prefer_avoid_pods 节点倾向
- taint_toleration 污点检测,检测有污点条件的node,得分越低
- selector_spreading 节点selector
- interpod_affinity pod亲和力遍历
- most_requested 资源消耗最大的节点
- node_label node标签
1. 2 指定nodeName调度
nodeName是PodSpec中的一个字段,可以通过pod.spec.nodeName指定将pod调度到某个具体的node节点上,该字段比较特殊一般都为空,如果有设置nodeName字段,kube-scheduler会直接跳过调度,在特定节点上通过kubelet启动pod。通过nodeName调度并非是集群的智能调度,通过指定调度的方式可能会存在资源不均匀的情况,建议设置Guaranteed的Qos,防止资源不均时候Pod被驱逐evince。如下以创建一个pod运行在node-3上为例:
- 编写yaml将pod指定在node-3节点上运行
[[email protected] demo]# cat nginx-nodeName.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-run-on-nodename
annotations:
kubernetes.io/description: "Running the Pod on specific nodeName"
spec:
containers:
- name: nginx-run-on-nodename
image: nginx:latest
ports:
- name: http-80-port
protocol: TCP
containerPort: 80
nodeName: node-3 #通过nodeName指定将nginx-run-on-nodename运行在特定节点node-3上
- 运行yaml配置使之生效
[[email protected] demo]# kubectl apply -f nginx-nodeName.yaml
pod/nginx-run-on-nodename created
- 查看确认pod的运行情况,已运行在node-3节点
[[email protected] demo]# kubectl get pods nginx-run-on-nodename -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-run-on-nodename 1/1 Running 0 6m52s 10.244.2.15 node-3 <none> <none>
1.2. 通过nodeSelector调度
nodeSelector是PodSpec中的一个字段,nodeSelector是最简单实现将pod运行在特定node节点的实现方式,其通过指定key和value键值对的方式实现,需要node设置上匹配的Labels,节点调度的时候指定上特定的labels即可。如下以node-2添加一个app:web的labels,调度pod的时候通过nodeSelector选择该labels:
- 给node-2添加labels
[[email protected] demo]# kubectl label node node-2 app=web
node/node-2 labeled
- 查看校验labels设置情况,node-2增加多了一个app=web的labels
[[email protected] demo]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node-1 Ready master 15d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=
node-2 Ready <none> 15d v1.15.3 app=web,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-2,kubernetes.io/os=linux
node-3 Ready <none> 15d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-3,kubernetes.io/os=linux
- 通过nodeSelector将pod调度到app=web所属的labels
[[email protected] demo]# cat nginx-nodeselector.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-run-on-nodeselector
annotations:
kubernetes.io/description: "Running the Pod on specific node by nodeSelector"
spec:
containers:
- name: nginx-run-on-nodeselector
image: nginx:latest
ports:
- name: http-80-port
protocol: TCP
containerPort: 80
nodeSelector: #通过nodeSelector将pod调度到特定的labels
app: web
- 应用yaml文件生成pod
[[email protected] demo]# kubectl apply -f nginx-nodeselector.yaml
pod/nginx-run-on-nodeselector created
- 检查验证pod的运行情况,已经运行在node-2节点
[[email protected] demo]# kubectl get pods nginx-run-on-nodeselector -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-run-on-nodeselector 1/1 Running 0 51s 10.244.1.24 node-2 <none> <none>
系统默认预先定义有多种内置的labels,这些labels可以标识node的属性,如arch架构,操作系统类型,主机名等
- beta.kubernetes.io/arch=amd64
- beta.kubernetes.io/os=linux
- kubernetes.io/arch=amd64
- kubernetes.io/hostname=node-3
- kubernetes.io/os=linux
1.3 node Affinity and anti-affinity
affinity/anti-affinity和nodeSelector功能相类似,相比于nodeSelector,affinity的功能更加丰富,未来会取代nodeSelector,affinity增加了如下的一些功能增强:
- 表达式更加丰富,匹配方式支持多样,如In,NotIn, Exists, DoesNotExist. Gt, and Lt;
- 可指定soft和preference规则,soft表示需要满足的条件,通过requiredDuringSchedulingIgnoredDuringExecution来设置,preference则是优选选择条件,通过preferredDuringSchedulingIgnoredDuringExecution指定
- affinity提供两种级别的亲和和反亲和:基于node的node affinity和基于pod的inter-pod affinity/anti-affinity,node affinity是通过node上的labels来实现亲和力的调度,而pod affinity则是通过pod上的labels实现亲和力的调度,两者作用的范围有所不同。
下面通过一个例子来演示node affinity的使用,requiredDuringSchedulingIgnoredDuringExecution指定需要满足的条件,preferredDuringSchedulingIgnoredDuringExecution指定优选的条件,两者之间取与关系。
- 查询node节点的labels,默认包含有多个labels,如kubernetes.io/hostname
[[email protected] ~]# kubectl get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
node-1 Ready master 15d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=
node-2 Ready <none> 15d v1.15.3 app=web,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-2,kubernetes.io/os=linux
node-3 Ready <none> 15d v1.15.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=node-3,kubernetes.io/os=linux
- 通过node affiinity实现调度,通过requiredDuringSchedulingIgnoredDuringExecution指定满足条件kubernetes.io/hostname为node-2和node-3,通过preferredDuringSchedulingIgnoredDuringExecution优选条件需满足app=web的labels
[[email protected] demo]# cat nginx-node-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx-run-node-affinity
annotations:
kubernetes.io/description: "Running the Pod on specific node by node affinity"
spec:
containers:
- name: nginx-run-node-affinity
image: nginx:latest
ports:
- name: http-80-port
protocol: TCP
containerPort: 80
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- node-1
- node-2
- node-3
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: app
operator: In
values: ["web"]
- 应用yaml文件生成pod
[[email protected] demo]# kubectl apply -f nginx-node-affinity.yaml
pod/nginx-run-node-affinity created
- 确认pod所属的node节点,满足require和 preferre条件的节点是node-2
[[email protected] demo]# kubectl get pods --show-labels nginx-run-node-affinity -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES LABELS
nginx-run-node-affinity 1/1 Running 0 106s 10.244.1.25 node-2 <none> <none> <none>
写在最后
本文介绍了kubernetes中的调度机制,默认创建pod是全自动调度机制,调度由kube-scheduler实现,调度过程分为两个阶段调度阶段(过滤和沉重排序)和绑定阶段(在node上运行pod)。通过干预有四种方式:
- 指定nodeName
- 通过nodeSelector
- 通过node affinity和anti-affinity
- 通过pod affinity和anti-affinity
附录
调度框架介绍:https://kubernetes.io/docs/concepts/configuration/scheduling-framework/
Pod调度方法:https://kubernetes.io/docs/concepts/configuration/assign-pod-node/
?
当你的才华撑不起你的野心时,你就应该静下心来学习
?
原文地址:https://blog.51cto.com/happylab/2468087