Kubernetes Pod 介绍
Pod 生命周期
- initContainers
- init c1
- init c2
- …
- containers(main)
- postStart
- 状态监测
livenessProbe
:存活状态监测,方法:exec、grpc、httpGet、tcpSocket
readinessProbe
:就绪状态监测
startupProbe
:启动探针
- …
- preStop
其中,pod stop 时会先发送终止信号,有30s的宽限期
Pod 对象的创建过程
sequenceDiagram
participant Client
participant APIServer
participant etcd
participant Scheduler
participant Kubelet
participant Container
Client->>+APIServer: Create Pod
APIServer->>+etcd: save info
etcd-->>-APIServer: return
APIServer-->>-Client: return
APIServer->>+Scheduler: ListWatch(new pod)
Scheduler->>-APIServer: bind pod
APIServer->>+etcd: save info
etcd-->>-APIServer: return
APIServer-->>+Scheduler: return
APIServer->>+Kubelet: ListWatch(bound pod)
Kubelet->>+Container: Container Run
Container-->>-Kubelet: return
Kubelet-->>-APIServer: Update Pod Status
APIServer->>+etcd: save info
etcd-->>-APIServer: return
Pod 对象的终止过程
- k8s-terminate-pod-process
sequenceDiagram
participant Client
participant APIServer
participant etcd
participant Kubelet
participant Container
participant EndpointController
Client->>+APIServer: Delete Pod
APIServer->>+etcd: save info
etcd-->>-APIServer: return
APIServer-->>-Client: shown as terminating
APIServer->>+Kubelet: watch(pod marked as termination)
Kubelet->>+Container: send Term signal
Kubelet-->>+Container: run preStop hooks
APIServer->>+EndpointController: watch(remove pod from endpoint of all services)
APIServer->>+Kubelet: watch(Expiry of grace period)
Kubelet->>+Container: send SIGKILL
Container-->>-APIServer: Immediate deletion of pod
APIServer->>+etcd: delete object from etcd
Pod 资源声明文件
apiVersion:group/version,使用 `kubectl api-resources` 命令获取
kind: 资源类别
metadata: 资源元数据信息。如名称、命令空间、标签等
name: <string>
namespace: <string>
labels: <map[string]string> 格式为:key=value
- key:字母、数字、_、-
- value:可以为空,只能以字母或数字开头和结尾
annotations: <map[string]string>,annotations:与 labels 不同之处在于,它不能用于挑选资源对象,仅用于为对象提供 `元数据`
- 格式:<domain>/KEY1=VAL1
selfLink: <string> 每个资源的引用Path,/api/GROUP/VERSION/namespaces/NAMESPACE/TYPE/NAME
spec:
containers: <[]object>,使用命令 `kubectl explain pod.spec.containers` 查看
- name: <string>
image: <string>
imagePullPolicy: <string> Always, Never, IfNotPresent
ports:
- name:
containerPort:
livenessProbe:
exec:
httpGet:
port:
path: /
tcpSocket:
initialDelaySeconds: 1
periodSeconds: 3
readinessProbe:
liftcycle:
command: <[]string>
args: <[]string>
restartPolicy:Always, OnFailure, Never. Default to Always.
说明:
- Labels 格式参考
- 重启策略 spec.restartPolicy 支持如下方式:
- Always: 只有Pod终止,就将其重启
- OnFailure: 仅在Pod出现错误时才重启
- Never: 从不重启
- 重启策略按照逐次
back-off delay
策略,如依次延时 10s, 20s, 40s…,最长延迟 5分钟。如果容器已经正常运行 10 分钟,重置重启延时策略
- k8s 中 command、args 和 dockerfile 中 entrypoint、cmd 之间的作用
- 若 k8s 中 command、args 均未配置,那么使用 dockerfile 默认配置
- 若 k8s 中有 command、但无 args,那么仅 command 生效(不考虑 docker file)
- 若 k8s 中无 command、但有 args,那么 dockerfile 中 entrypoint 生效,且使用 yaml 的 args 参数
- 若 k8s 中 command、args 均配置,那么忽略 dockerfile 配置
Pod and Container
Container 类型
- 标准容器
- Sidecar容器
- INIT 容器,在同一
Pod
中的其他容器启动之前开始并执行的容器。见 Pause Pod 介绍
- Ephemeral 容器
Pod phase
Pod 状态:
Pending
: kubernetes集群接受pod创建,但是容器还没有创建完成。Pod scheduled time + downloading container images time
Running
: pod已经调度到对应node,所有容器已经创建完成。至少有一个容器在running,或处于 starting 或 restarting 状态
Succeeded
: 所有容器终止成功,并不会在restarted
Failed
: 所有容器已经终止(terminated),至少有一个容器终止失败
Unknown
: 其他原因的未知状态
kubectl get <name-of-pod>
Container states
Waiting
: 拉取镜像、申请Secret data
Running
: 正常运行。如果配置 postStart hook,当前状态已经执行完成 postStart hook
Terminated
: 停止中,要么完成,要么由于某种原因失败。如果配置 preStop hook,会在进入Terminated前运行该hook
查看命令:
kubectl describe pod <name-of-pod>
Container Fail Reason
启动时的错误包括:
- ImagePullBackOff
- ImageInspectError
- ErrImagePull
- ErrImageNeverPull
- RegistryUnavailable
- InvalidImageName
运行时的错误包括:
- CrashLoopBackOff
- RunContainerError
- KillContainerError
- VerifyNonRootError
- RunInitContainerError
- CreatePodSandboxError
- ConfigPodSandboxError
- KillPodSandboxError
- SetupNetworkError
- TeardownNetworkError
- Some errors are more common than others.
示例
$ kubectl describe pod web-0
Name: web-0
Namespace: xiexianbin
Priority: 0
Node: k8s-node-1/172.20.0.82
Start Time:
Labels: app=nginx
controller-revision-hash=web-5878b45b96
statefulset.kubernetes.io/pod-name=web-0
Annotations: cni.projectcalico.org/podIP: 10.42.93.76/32
cni.projectcalico.org/podIPs: 10.42.93.76/32
Status: Pending # Pod status
IP: 10.42.93.76
IPs:
IP: 10.42.93.76
Controlled By: StatefulSet/web
Containers:
nginx:
Container ID:
Image: gcmirrors/nginx-slim:0.8
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting # Container status
Reason: ImagePullBackOff # Reason
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-87gtb (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-87gtb:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-87gtb
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m default-scheduler Successfully assigned xiexianbin/web-0 to k8s-node-1
Normal Pulling 15m (x4 over 18m) kubelet Pulling image "gcmirrors/nginx-slim:0.8"
Warning Failed 15m (x4 over 17m) kubelet Error: ErrImagePull
Warning Failed 14m (x6 over 17m) kubelet Error: ImagePullBackOff
Warning Failed 13m (x5 over 17m) kubelet Failed to pull image "gcmirrors/nginx-slim:0.8": rpc error: code = Unknown desc = Error response from daemon: pull access denied for gcmirrors/nginx-slim, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Normal BackOff 3m3s (x54 over 17m) kubelet Back-off pulling image "gcmirrors/nginx-slim:0.8"
$ cat test-pod-sleep-10.yaml
apiVersion: v1
kind: Pod
metadata:
name: busybox-deployment
spec:
restartPolicy: Never
containers:
- name: busybox
image: busybox:latest
command:
- sh
- "-c"
- |
sleep 10
$ kubectl apply -f test-pod-sleep-10.yaml
$ kubectl get pod --watch
NAME READY STATUS RESTARTS AGE
busybox-deployment 0/1 ContainerCreating 0 5s
busybox-deployment 1/1 Running 0 18s
busybox-deployment 0/1 Completed 0 29s
Container 探针
容器的几种探针:
livenessProbe
: 存活探针,容器存活状态检测。如果liveness prode fails,kubelet会杀死容器,并按照restart policy决定容器后续状态。Success时将分配流量。默认为Success
kubectl explain pod.spec.containers.livenessProbe
readinessProbe
: 就绪探针,容器服务是否可以接受请求检测。默认为Success
startupProbe
: 启动探针,检测容器内服务是否启动探针。当提供启动探针时,所有其他探针都会被禁用,直到此探针成功为止。适用于启动时间长的服务
liftcycle
postStart
: 容器创建后立即执行探测。不能保证和容器ENTRYPOINT的执行先后。不支持传参
preStop
: 容器终止前执行探测。This hook is called immediately before a container is terminated due to an API request or management event such as liveness probe failure, preemption, resource contention and others。可能因容器已经终止而执行失败。
容器探针有几种探测方式:
- ExecAction
- TCPSocketAction
- HTTPGetAction
- grpc
每种检查方式均有三种状态:
readinessGates
就绪探针,配置在spec.readinessGates
中
kind: Pod
...
spec:
readinessGates:
- conditionType: "www.example.com/feature-1"
status:
conditions:
- type: Ready # a built in PodCondition
status: "False"
lastProbeTime: null
lastTransitionTime: 2018-01-01T00:00:00Z
- type: "www.example.com/feature-1" # an extra PodCondition
status: "False"
lastProbeTime: null
lastTransitionTime: 2018-01-01T00:00:00Z
containerStatuses:
- containerID: docker://abcd...
ready: true
...
Pod 控制器
Pod 控制器常用类型:
- ReplicationController, rc 副本集控制器
- ReplicaSet, rs 副本集
- Deployment
- DaemonSet, ds 在每一个运行一个 Pod
- Job 控制批处理任务的对象
- Cronjob 周期任务
- StatefulSet 有状态副本集
其中,RC、RS 和 Deployment 能保证 Pod 的数量。
其他的控制器类型:
- TPR: Third Party Resources, 1.2+ 支持, 1.7 废弃,被 CDR 替换
- CDR: Custom Defined Resources, 1.8+
- Operator: 封装运维技能,常见的 etcd、Prometheus 等
ReplicaSet 控制器示例
查看帮助:kubectl explain rs
apiVersion: apps/v1
kind: ReplicaSet
metadata:
name: hello-app-rs
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: hello-app-rs
release: canary
template:
metadata:
name: hello-app-pod
labels:
app: hello-app-rs
release: canary
spec:
containers:
- name: hello-app-1
image: gcriogooglesamples/hello-app:1.0
ports:
- name: http
containerPort: 8080
Deployment 控制器示例
查看帮助:kubectl explain deploy
有 strategy
定义更新策略
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-app-dp
namespace: default
spec:
replicas: 2
selector:
matchLabels:
app: hello-app
release: canary
template:
metadata:
name: hello-app-pod
labels:
app: hello-app
release: canary
spec:
containers:
- name: hello-app-1
image: gcriogooglesamples/hello-app:1.0
ports:
- name: http
containerPort: 8080
$ kubectl apply -f deployment-hello-app.yaml
deployment.apps/hello-app created
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
hello-app-dp-65778479b8-jzklm 1/1 Running 0 3s
hello-app-dp-65778479b8-wcv9n 1/1 Running 0 3s
$ kubectl get replicasets.apps
NAME DESIRED CURRENT READY AGE
hello-app-dp-65778479b8 2 2 2 11s
$ kubectl get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
hello-app-dp 2/2 2 2 14s
说明:
- Deployment 是基于 ReplicaSet 构建的
65778479b8
的值是基于 template 的 hash 的值
- 更新 deployment,修改上述文件中
image: gcriogooglesamples/hello-app:1.0
为 image: gcriogooglesamples/hello-app:2.0
$ kubectl apply -f deployment-hello-app.yaml
deployment.apps/hello-app-dp configured
$ kubectl get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
hello-app-dp 2/2 2 2 11m
$ kubectl get replicasets.apps -o wide
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
hello-app-dp-65778479b8 0 0 0 11m hello-app-1 gcriogooglesamples/hello-app:1.0 app=hello-app,pod-template-hash=65778479b8,release=canary
hello-app-dp-665877bb77 2 2 2 90s hello-app-1 gcriogooglesamples/hello-app:2.0 app=hello-app,pod-template-hash=665877bb77,release=canary
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
hello-app-dp-665877bb77-2v74r 1/1 Running 0 55s
hello-app-dp-665877bb77-pdm2p 1/1 Running 0 99s
$ kubectl rollout history deployment hello-app-dp
deployment.apps/hello-app-dp
REVISION CHANGE-CAUSE
1 <none>
2 <none>
说明:
- 查看更新策略:
RollingUpdateStrategy: 25% max unavailable, 25% max surge
,若不足一个时,按一个计算
- 一个 Deployment 对应多个 ReplicaSets,其中
hello-app-dp-665877bb77
是当前工作的,镜像版本为 v2.0
Deployment 引用参数示例
Pod 中注入容器的 IP 地址
apiVersion: apps/v1
kind: Deployment
metadata:
name: hello-app-dp
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: hello-app
template:
metadata:
name: hello-app-pod
labels:
app: hello-app
spec:
containers:
- name: hello-app-1
image: alpine:3.16.2
command:
- sleep
- "3600"
ports:
- name: http
containerPort: 8080
env:
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
DeamonSet 控制器示例
查看帮助:kubectl explain ds
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: filebeat-ds
namespace: default
spec:
selector:
matchLabels:
app: filebeat-ds
release: canary
template:
metadata:
name: filebeat-ds-pod
labels:
app: filebeat-ds
release: canary
spec:
containers:
- name: filebeat-ds-1
image: gcriogooglesamples/k8s-filebeat:1.0
# env:
# - name: http
# value: redis.default.svc.kb.cx
Pod 运行的用户/组信息
参考
apiVersion: v1
kind: Pod
metadata:
name: security-context-demo
spec:
securityContext:
runAsUser: 1000 # 所有进程的用户ID为 1000
runAsGroup: 3000 # 所有进程的组ID为 3000
fsGroup: 2000 # 所有 volume 的组ID 2000
...
Pod HostAliases
为容器指定 DNS
apiVersion: v1
kind: Pod
metadata:
name: hostaliases-pod
spec:
restartPolicy: Never
hostAliases:
- ip: "127.0.0.1"
hostnames:
- "foo.local"
...
Pod Security Admission
PodSecurityPolicy
在 Kubernetes v1.21
版本时已弃用,并将在 v1.25
版本中移除。它已经被 Pod Security Admission 所取代
Other
share process namespace
- share-process-namespace.yaml
cat << EOF > share-process-namespace.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx
spec:
shareProcessNamespace: true
containers:
- name: nginx
image: nginx
- name: shell
image: busybox
securityContext:
capabilities:
add:
- SYS_PTRACE
stdin: true
tty: true
EOF
kubectl apply -f share-process-namespace.yaml
命令变量
env:
- name: MESSAGE
value: "hello world"
command: ["/bin/echo"]
args: ["$(MESSAGE)"]
- 环境变量需要加上括号,类似于 “$(VAR)"。这是在 command 或 args 字段使用变量的格式要求
容器变量注入
hostname 有 63 个字符的 hostnamectl 限制
apiVersion: v1
kind: Pod
metadata:
name: busybox-env
spec:
containers:
- name: busybox-container
image: busybox
command:
- sleep
- "3600"
env:
- name: MY_NODE_NAME # 获取node名称
valueFrom:
fieldRef:
fieldPath: spec.nodeName
- name: MY_POD_NAME # 获取pod名称
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE # 获取pod的namespace
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: MY_POD_IP # 获取pod IP
valueFrom:
fieldRef:
fieldPath: status.podIP
volumeMounts:
- name: podname-volume
mountPath: /var/run/secrets/kubernetes.io/podname
readOnly: true
volumes:
- name: podname-volume
downwardAPI:
items:
- path: "podname"
fieldRef:
fieldPath: metadata.name