解决k8s出现pod服务一直处于ContainerCreating状态的问题的过程

参考于:

https://blog.csdn.net/learner198461/article/details/78036854

https://liyang.pro/solve-k8s-pod-containercreating/

https://blog.csdn.net/golduty2/article/details/80625485

根据实际情况稍微做了修改和说明。

在创建Dashborad时,查看状态总是ContainerCreating

[[email protected] k8s]# kubectl get pod --namespace=kube-system
NAME                                    READY     STATUS              RESTARTS   AGE
kubernetes-dashboard-2094756401-kzhnx   0/1       ContainerCreating   0          10m

通过kubectl describe命令查看具体信息(或查看日志/var/log/message)

[[email protected] k8s]# kubectl describe pod kubernetes-dashboard-2094756401-kzhnx --namespace=kube-system
Name:        kubernetes-dashboard-2094756401-kzhnx
Namespace:    kube-system
Node:        mycentos7-1/192.168.126.131
Start Time:    Tue, 05 Jun 2018 19:28:25 +0800
Labels:        app=kubernetes-dashboard
        pod-template-hash=2094756401
Status:        Pending
IP:
Controllers:    ReplicaSet/kubernetes-dashboard-2094756401
Containers:
  kubernetes-dashboard:
    Container ID:
    Image:        daocloud.io/megvii/kubernetes-dashboard-amd64:v1.8.0
    Image ID:
    Port:        9090/TCP
    Args:
      --apiserver-host=http://192.168.126.130:8080
    State:            Waiting
      Reason:            ContainerCreating
    Ready:            False
    Restart Count:        0
    Liveness:            http-get http://:9090/ delay=30s timeout=30s period=10s #success=1 #failure=3
    Volume Mounts:        <none>
    Environment Variables:    <none>
Conditions:
  Type        Status
  Initialized     True
  Ready     False
  PodScheduled     True
No volumes.
QoS Class:    BestEffort
Tolerations:    <none>
Events:
  FirstSeen    LastSeen    Count    From            SubObjectPath    Type        Reason        Message
  ---------    --------    -----    ----            -------------    --------    ------        -------
  11m        11m        1    {default-scheduler }            Normal        Scheduled    Successfully assigned kubernetes-dashboard-2094756401-kzhnx to mycentos7-1
  11m        49s        7    {kubelet mycentos7-1}            Warning        FailedSync    Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failede:latest, this may be because there are no credentials on this request.  details: (open /etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt: no such file or directory)"

  11m    11s    47    {kubelet mycentos7-1}        Warning    FailedSync    Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImagePullBackOff: "Back-off pulling image \"registry.access.redh

在工作节点(node)上执行发现此时会pull一个镜像registry.access.redhat.com/rhel7/pod-infrastructure:latest,当我手动pull时,提示如下错误:

[[email protected] k8s]# docker pull registry.access.redhat.com/rhel7/pod-infrastructure:latestTrying to pull repository registry.access.redhat.com/rhel7/pod-infrastructure ... open /etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt: no such file or directory

通过提示的路径查找该文件,是个软连接,链接目标是/etc/rhsm,查看没有rhsm

[[email protected] ca]# cd /etc/docker/certs.d/registry.access.redhat.com/
[[email protected] registry.access.redhat.com]# ll
总用量 0
lrwxrwxrwx. 1 root root 27 5月  11 14:30 redhat-ca.crt -> /etc/rhsm/ca/redhat-uep.pem
[[email protected] ca]# cd /etc/rhsm-bash: cd: /etc/rhsm: 没有那个文件或目录

安装rhsm(node上):

yum install *rhsm*
已加载插件:fastestmirror, langpacks
Loading mirror speeds from cached hostfile
 * base: mirror.lzu.edu.cn
 * extras: mirror.lzu.edu.cn
 * updates: ftp.sjtu.edu.cn
base                                                                                                                                                                                  | 3.6 kB  00:00:00
extras                                                                                                                                                                                | 3.4 kB  00:00:00
updates                                                                                                                                                                               | 3.4 kB  00:00:00
软件包 python-rhsm-1.19.10-1.el7_4.x86_64 被已安装的 subscription-manager-rhsm-1.20.11-1.el7.centos.x86_64 取代
软件包 subscription-manager-rhsm-1.20.11-1.el7.centos.x86_64 已安装并且是最新版本
软件包 python-rhsm-certificates-1.19.10-1.el7_4.x86_64 被已安装的 subscription-manager-rhsm-certificates-1.20.11-1.el7.centos.x86_64 取代
软件包 subscription-manager-rhsm-certificates-1.20.11-1.el7.centos.x86_64 已安装并且是最新版本

但是在/etc/rhsm/ca/目录下依旧没有证书文件,于是反复卸载与安装都不靠谱,后来发现大家所谓yum install *rhsm*其实安装的的是python-rhsm-1.19.10-1.el7_4.x86_64python-rhsm-certificates-1.19.10-1.el7_4.x86_64,但是在实际安装过程中会有如下提示:

软件包 python-rhsm-1.19.10-1.el7_4.x86_64 被已安装的 subscription-manager-rhsm-1.20.11-1.el7.centos.x86_64 取代
软件包 subscription-manager-rhsm-1.20.11-1.el7.centos.x86_64 已安装并且是最新版本
软件包 python-rhsm-certificates-1.19.10-1.el7_4.x86_64 被已安装的 subscription-manager-rhsm-certificates-1.20.11-1.el7.centos.x86_64 取代
软件包 subscription-manager-rhsm-certificates-1.20.11-1.el7.centos.x86_64 已安装并且是最新版本

罪魁祸首在这里。原来我们想要安装的rpm包被取代了。而取代后的rpm包在安装完成后之创建了目录,并没有证书文件redhat-uep.pem。于是乎,手动下载以上两个包

wget ftp://ftp.icm.edu.pl/vol/rzm6/linux-scientificlinux/7.4/x86_64/os/Packages/python-rhsm-certificates-1.19.9-1.el7.x86_64.rpm
wget ftp://ftp.icm.edu.pl/vol/rzm6/linux-scientificlinux/7.4/x86_64/os/Packages/python-rhsm-1.19.9-1.el7.x86_64.rpm

注:在此处有时会报错,提示找不到这两个rpm文件,此时需要手动登录到此FTP进行下载,文件要稍等会才会加载出来,然后下载所需的这两个rpm(可能是网络原因,有时不稳定)

注意版本要匹配,卸载安装错的包

yum remove *rhsm*

然后执行安装命令

rpm -ivh *.rpm

rpm -ivh *.rpm
警告:python-rhsm-1.19.9-1.el7.x86_64.rpm: 头V4 DSA/SHA1 Signature, 密钥 ID 192a7d7d: NOKEY
准备中...                          ################################# [100%]
正在升级/安装...
   1:python-rhsm-certificates-1.19.9-1################################# [ 50%]
   2:python-rhsm-1.19.9-1.el7         ################################# [100%]

我在这一步有出错了

[[email protected] registry.access.redhat.com]# rpm -ivh *.rpm
警告:python-rhsm-1.19.9-1.el7.x86_64.rpm: 头V4 DSA/SHA1 Signature, 密钥 ID 192a7d7d: NOKEY
错误:依赖检测失败:
        python-rhsm <= 1.20.3-1 被 (已安裝) subscription-manager-rhsm-1.20.11-1.el7.centos.x86_64 取代
        python-rhsm-certificates <= 1.20.3-1 被 (已安裝) subscription-manager-rhsm-certificates-1.20.11-1.el7.centos.x86_64 取代

此时跳到分割线之下,用分割线下面的文章的方法remove掉已经有的包,再重新用上面的命令安装。

接着验证手动pull镜像

docker pull registry.access.redhat.com/rhel7/pod-infrastructure:latest
Trying to pull repository registry.access.redhat.com/rhel7/pod-infrastructure ...
latest: Pulling from registry.access.redhat.com/rhel7/pod-infrastructure
26e5ed6899db: Pull complete
66dbe984a319: Pull complete
9138e7863e08: Pull complete
Digest: sha256:92d43c37297da3ab187fc2b9e9ebfb243c1110d446c783ae1b989088495db931
Status: Downloaded newer image for registry.access.redhat.com/rhel7/pod-infrastructure:latest

问题解决。

--------------------------------------------------------------------------------------------------------------------------------

在《kubernetes权威指南》入门的一个例子中,发现pod一直处于ContainerCreating的状态,用kubectl describe pod mysql的时候发现如下报错:

  1. Events:

  2.  

    FirstSeen LastSeen Count From SubObjectPath Type Reason Message

  3.  

    --------- -------- ----- ---- ------------- -------- ------ -------

  4.  

    1h 24m 17 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ErrImagePull: "image pull failed for registry.access.redhat.com/rhel7/pod-infrastructure:latest, this may be because there are no credentials on this request. details: (open /etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt: no such file or directory)"

  5.  

    1h 19m 291 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: failed to "StartContainer" for "POD" with ImagePullBackOff: "Back-off pulling image \"registry.access.redhat.com/rhel7/pod-infrastructure:latest\""

  6.  

    15m 15m 1 {kubelet 127.0.0.1} Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.

  7.  

    15m 15m 1 {kubelet 127.0.0.1} spec.containers{mysql} Normal Pulling pulling image "mysql"

  8.  

    7m 7m 1 {kubelet 127.0.0.1} Warning MissingClusterDNS kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to DNSDefault policy.

  9.  

    7m 7m 1 {kubelet 127.0.0.1} spec.containers{mysql} Normal Pulling pulling image "mysql"

问题是比较明显的,就是没有/etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt文件,用ls -l查看之后发现是一个软链接,链接到/etc/rhsm/ca/redhat-uep.pem,但是这个文件不存在,使用yum search *rhsm*命令:

  • 安装python-rhsm-certificates包:
# yum install python-rhsm-certificates -y

这里又出现问题了:

python-rhsm-certificates <= 1.20.3-1 被 (已安裝) subscription-manager-rhsm-certificates-1.20.11-1.el7.centos.x86_64 取代

那么怎么办呢,我们直接卸载掉subscription-manager-rhsm-certificates包,使用yum remove subscription-manager-rhsm-certificates -y命令,然后下载python-rhsm-certificates包:

# wget http://mirror.centos.org/centos/7/os/x86_64/Packages/python-rhsm-certificates-1.19.10-1.el7_4.x86_64.rpm

然后手动安装该rpm包:

# rpm -ivh python-rhsm-certificates

这时发现/etc/rhsm/ca/redhat-uep.pem文件已存在。

  • 使用docker pull registry.access.redhat.com/rhel7/pod-infrastructure:latest命令下载镜像,但是可能会很慢,可以到https://dashboard.daocloud.io网站上注册账号,然后点击加速器,然后复制代码执行,之后重启docker就会进行加速,如果重启docker服务的时候无法启动,使用systemctl status docker:
  1. # systemctl status docker

  2.  

    ● docker.service - Docker Application Container Engine

  3.  

    Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)

  4.  

    Active: failed (Result: exit-code) since 一 2018-05-28 22:13:37 CST; 13s ago

  5.  

    Docs: http://docs.docker.com

  6.  

    Process: 79849 ExecStart=/usr/bin/dockerd-current --add-runtime docker-runc=/usr/libexec/docker/docker-runc-current --default-runtime=docker-runc --exec-opt native.cgroupdriver=systemd --userland-proxy-path=/usr/libexec/docker/docker-proxy-current --init-path=/usr/libexec/docker/docker-init-current --seccomp-profile=/etc/docker/seccomp.json $OPTIONS $DOCKER_STORAGE_OPTIONS $DOCKER_NETWORK_OPTIONS $ADD_REGISTRY $BLOCK_REGISTRY $INSECURE_REGISTRY $REGISTRIES (code=exited, status=1/FAILURE)

  7.  

    Main PID: 79849 (code=exited, status=1/FAILURE)

  8.  

    5月 28 22:13:37 kube.example.com systemd[1]: Starting Docker Application Container Engine...

  9.  

    5月 28 22:13:37 kube.example.com dockerd-current[79849]: unable to configure the Docker daemon with file /etc/docker/daemon.json: invalid character ‘}‘ loo...y string

  10.  

    5月 28 22:13:37 kube.example.com systemd[1]: docker.service: main process exited, code=exited, status=1/FAILURE

  11.  

    5月 28 22:13:37 kube.example.com systemd[1]: Failed to start Docker Application Container Engine.

  12.  

    5月 28 22:13:37 kube.example.com systemd[1]: Unit docker.service entered failed state.

  13.  

    5月 28 22:13:37 kube.example.com systemd[1]: docker.service failed.

  14.  

    Hint: Some lines were ellipsized, use -l to show in full

这时将/etc/docker/seccomp.json删除,再次重启即可

  • 这时将之前创建的rc、svc和pod全部删除重新创建,过一会就会发现pod启动成功

原因猜想:根据报错信息,pod启动需要registry.access.redhat.com/rhel7/pod-infrastructure:latest镜像,需要去红帽仓库里下载,但是没有证书,安装证书之后就可以了

原文地址:https://www.cnblogs.com/xiaohanlin/p/9276201.html

时间: 2024-11-06 22:13:58

解决k8s出现pod服务一直处于ContainerCreating状态的问题的过程的相关文章

kubernetes创建yaml,pod服务一直处于 ContainerCreating状态的原因查找与解决

最近刚刚入手研究kubernetes,运行容器的时候,发现一直处于ContainerCreating状态,悲了个催,刚入手就遇到了点麻烦,下面来讲讲如何查找问题及解决的 运行容器命令: kubectl -f create redis.yaml kubectl get pod redis NAME                 READY     STATUS              RESTARTS   AGEredis-master-6jgsl   0/1       ContainerC

使用kubernetes创建容器一直处于ContainerCreating状态的原因查找与解决

运行容器的时候,发现一直处于ContainerCreating状态,悲了个催,刚入手就遇到了点麻烦,下面来讲讲如何查找问题及解决的 运行容器命令: [[email protected]149 ~]# kubectl run my-alpine --image=alpine --replicas=2 ping www.baidu.com 查看pods状态 1 [[email protected]149 ~]# kubectl get pods 2 NAME READY STATUS RESTART

k8s创建容器pod一直处于ContainerCreating,

刚刚在自学过程中发现创建pod之后,一直处于ContainerCreating状态: 之后我用kubectl describe pod nginx,发现报错:open /etc/docker/certs.d/registry.access.redhat.com/redhat-ca.crt: no such file or directory.去文件夹查看之后,发现redhar-ca.crt文件存在,不过用命令ll查看,发现其链接文件/etc/rhsm/ca/redhat-uep.pem,不存在,

k8s的pod的资源调度

1.常用的预选策略 2.优选函数 3.节点亲和调度 3.1.节点硬亲和性 3.2.节点软亲和性 4.Pod资源亲和调度 4.1.Pod硬亲和度 4.2.Pod软亲和度 4.3.Pod反亲和度 5.污点和容忍度 5.1.定义污点和容忍度 5.2.管理节点的污点 5.3.Pod对象的容忍度 API Server在接受客户端提交Pod对象创建请求后,然后是通过调度器(kube-schedule)从集群中选择一个可用的最佳节点来创建并运行Pod. 而这一个创建Pod对象,在调度的过程当中有3个阶段:节点

k8s删除pod一直处于terminating状态

用的nfs挂载卷,当删除pv后再删除pod时,pod一直处于terminating状态. 如下图: 解决方法: 可使用kubectl中的强制删除命令 # 删除POD kubectl delete pod [pod name] --force --grace-period=0 -n [namespace] # 删除NAMESPACE kubectl delete namespace NAMESPACENAME --force --grace-period=0 若以上方法无法删除,可使用第二种方法,

k8s的Pod状态和生命周期管理

Pod状态和生命周期管理 一.什么是Pod? 二.Pod中如何管理多个容器? 三.使用Pod 四.Pod的持久性和终止 五.Pause容器 六.init容器 七.Pod的生命周期 (1)Pod phase(Pod的相位) (2)Pod的创建过程 (3)Pod的状态 (4)Pod存活性探测 (5)livenessProbe和readinessProbe使用场景 (6)Pod的重启策略 (7)Pod的生命 (8)livenessProbe解析 一.什么是Pod? Pod是kubernetes中你可以

Kubernetes强制删除一直处于Terminating状态的pod。

在dashboard界面删除容器,发现无法删除.使用命令查看发现该pod一直处于terminating的状态Kubernetes强制删除一直处于Terminating状态的pod. 1.使用命令获取pod的名字kubectl get po -n NAMESPACE |grep Terminating2.使用kubectl中的强制删除命令kubectl delete pod podName -n NAMESPACE --force --grace-period=0 原文地址:https://blo

用友iuap云运维平台支持基于K8s的微服务架构

什么是微服务架构? 微服务(MicroServices)架构是当前互联网业界的一个技术热点,业内各公司也都纷纷开展微服务化体系建设.微服务架构的本质,是用一些功能比较明确.业务比较精练的服务去解决更大.更实际的问题.该架构强调的一些准则:单一职责.协议轻量.进程隔离.数据分离.独立部署.按需伸缩. 什么是Kubernetes? Kubernetes是Google开源的容器集群管理系统,其提供应用部署.维护. 扩展机制等功能,利用Kubernetes能方便地管理跨机器运行容器化的应用,其主要功能:

k8s更新Pod镜像

实际使用k8s中,如果使用RC启动pod可以直接使用滚动更新进行pod版本的升级,但是我们使用的情况是在pod里面启动有状态的mysql服务,没有和RC进行关联,这样更新的时候只能通过 更新pod的配置直接替换的形式进行更新了,以下脚本是我们进行更新的简单脚本: #!/bin/bash #命名空间 ns=$1 #pod名称 podname=$2 #获取pod yaml配置 /root/k8s.sh th --namespace=$ns get pods $podname -o yaml > &quo