[TOC]
1 Docker 基础准备
1.1 SELinux
查看 SELinux 模式:getenforce
SELinux 的三种模式:
-
enforcing:强制模式,代表 SELinux 运作中,且已经正确的开始限制 domain/type 了;
-
permissive:宽容模式:代表 SELinux 运作中,不过仅会有警告讯息并不会实际限制 domain/type 的存取。这种模式可以运来作为 SELinux 的 debug 之用;
-
disabled:关闭,SELinux 并没有实际运作
修改 SELinux 模式(修改后需要重启)
1
2
3
vi /etc/selinux/config
SELINUX=enforcing <==调整 enforcing|disabled|permissive
SELINUXTYPE=targeted
1
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
2 Kubernetes
2.1 K8s 框架
- kubectl: 客户端命令行工具,将接受的命令格式话后发送给 kube-apiserver,作为整个系统的操作入口。
- kube-apiserver: 作为整个系统的控制入口,以 REST API 服务提供接口。
- kube-controller-manager: 用来执行整个系统中的后台任务,包括节点状态状况、Pod个数、Pods和Service的关联等。
- kube-scheduler: 负责节点资源管理,接收来自 kube-apiserver 创建 Pods 任务,并分配到某个节点。
- etcd: 负责节点间的服务发现和配置共享。
- kubelet: 运行在每个计算节点上,作为 agent,接收分配该节点的 Pods 任务及管理容器,周期性获取容器状态,反馈给kube-apiserver。
- kube-proxy: 运行在每个计算节点上,负责Pod网络代理。定时从 etcd 获取到 service 信息来做相应的策略。
K8s 如何实现跨主机通信?
CNI(Container Network Interface) 插件,如 flannel,calico
2.2 K8s 安装
2.2.1 kubeadmin 安装方式
1 安装步骤
安装要求:
- 一台或多台机器,操作系统 CentOS7.x-86_x64
- 硬件配置:2GB或更多RAM,2个CPU或更多CPU,硬盘30GB或更多
- 集群中所有机器之间网络互通
- 可以访问外网,需要拉取镜像
- 禁止swap分区
master 和 node 上执行:
1) 系统配置
配置系统:
1
$ cat /etc/redhat-release
关闭防火墙
1
2
3
$ systemctl stop firewalld.service
# 禁止防火墙开机自启
$ systemctl disable firewalld.service
关闭 selinux
1
2
3
4
# 修改后需要重启才能生效
$ sed -i 's/SELLINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
$ setenforce 0
配置 /etc/hosts
1
$ vi /etc/hosts
关闭 swap
注:
Q:什么是 swap 分区?为什么要禁用 swap 分区?
A:swap 分区是将一部分磁盘空间用作内存使用,可以暂时解决内存不足的问题,Linux 系统默认会开启 swap 分区。
个人观点:swap 分区虽然能解决内存暂时不足的问题,但是与磁盘交互 IO 会影响应用程序的性能和稳定性,也不是长久之计。若考虑服务质量,服务提供商应该禁用 swap 分区。客户在内存资源不够时,可以临时申请更大的内存。
目前 K8s 版本是不支持 swap 的,经过漫长的讨论,最终 K8s 社区确实打算支持 swap,但还是实验版。
K8s 社区对开启 swap 功能的讨论:https://github.com/kubernetes/kubernetes/issues/53533
1
2
3
$ swapoff -a
# 注释自启动中的swap
$ sed -ri 's/.*swap.*/#&/' /etc/fstab #永久关闭
1
$ hostnamectl set-hostname 名字
启用 netfilter 和内核 IP 转发(路由):
注:kube-proxy 需要启用
net.bridge.bridge-nf-call-iptables
;
1
2
3
4
5
6
# 将桥接的IPv4流量传递到iptables的链
$ cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
$ sysctl --system
2) 安装 Docker
1
2
3
$ yum install -y wget && wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo
$ yum -y install docker-ce-18.06.1.ce-3.el7
配置镜像加速器
1
2
3
4
5
6
7
8
$ mkdir -p /etc/docker
$ cat > /etc/docker/daemon.json << EOF
{
"registry-mirrors": ["https://o7zhcmyv.mirror.aliyuncs.com"]
}
EOF
$ systemctl daemon-reload
$ systemctl restart docker
设置开机自启
1
$ systemctl enable docker
3) 安装 Kube 组件
配置 yum 源
1
2
3
# 删除原有的源
$ rm -rf /etc/yum.repos.d/*
# 新增如下三个aliyun源
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# CentOS-Base.repo
#
# The mirror system uses the connecting IP address of the client and the
# update status of each mirror to pick mirrors that are updated to and
# geographically close to the client. You should use this for CentOS updates
# unless you are manually picking other mirrors.
#
# If the mirrorlist= does not work for you, as a fall back you can try the
# remarked out baseurl= line instead.
[base]
name=CentOS-$releasever - Base - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/$releasever/os/$basearch/
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
gpgcheck=1
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
#released updates
[updates]
name=CentOS-$releasever - Updates - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates
gpgcheck=1
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
#additional packages that may be useful
[extras]
name=CentOS-$releasever - Extras - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras
gpgcheck=1
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
#additional packages that extend functionality of existing packages
[centosplus]
name=CentOS-$releasever - Plus - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/$releasever/centosplus/$basearch/
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=centosplus
gpgcheck=1
enabled=0
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
#contrib - packages by Centos Users
[contrib]
name=CentOS-$releasever - Contrib - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/$releasever/contrib/$basearch/
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=contrib
gpgcheck=1
enabled=0
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[epel]
name=Extra Packages for Enterprise Linux 7 - $basearch
baseurl=http://mirrors.aliyun.com/epel/7/$basearch
#mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=$basearch
failovermethod=priority
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
[epel-debuginfo]
name=Extra Packages for Enterprise Linux 7 - $basearch - Debug
baseurl=http://mirrors.aliyun.com/epel/7/$basearch/debug
#mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-debug-7&arch=$basearch
failovermethod=priority
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
gpgcheck=0
[epel-source]
name=Extra Packages for Enterprise Linux 7 - $basearch - Source
baseurl=http://mirrors.aliyun.com/epel/7/SRPMS
#mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-source-7&arch=$basearch
failovermethod=priority
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
gpgcheck=0
1
2
3
4
5
6
7
8
9
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
安装kubernetes相关软件包
1
2
3
4
5
$ yum install -y kubelet-1.18.6 kubeadm-1.18.6 kubectl-1.18.6
$ systemctl restart kubelet && systemctl enable kubelet
$ systemctl is-active kubelet
# 此时 kubelet 状态是:activating
4) 部署 Kubernetes Master
在master上执行
1
2
3
4
5
$ kubeadm init \
--node-name=vm103 \
--apiserver-advertise-address=192.168.99.103 \
--kubernetes-version=v1.18.6 \
--pod-network-cidr=10.244.0.0/16
1
2
3
4
5
6
7
8
# 国内镜像库
$ kubeadm init \
--node-name=main \
--apiserver-advertise-address=192.168.99.120 \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.18.6 \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=10.244.0.0/16
安装成功后,根据提示配置 kubectl工具:
1
2
3
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
5) 安装 flannel 网络插件
1
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
小提示:
如果出现:The connection to the server raw.githubusercontent.com was refused - did you specify the right host or port?
可以先在 https://www.ipaddress.com 查询raw.githubusercontent.com的真实IP,修改 hosts 后重新执行;
1 2 $ sudo vi /etc/hosts 199.232.68.133 raw.githubusercontent.com
6) 子节点加入集群
1
2
3
4
5
6
7
8
9
# 拉取最新的 flannel
$ docker pull quay.io/coreos/flannel:v0.14.0
$ kubeadm join 192.168.99.103:6443 --token lgrcmk.2hck482gsnrn6ykm \
--discovery-token-ca-cert-hash sha256:0b4dc91d4c73029f654f1f361b87c05818140f09f8b0742d99fc56da47a0dfbf \
--node-name vm104
# 上面命令重新获取的方式
$ kubeadm token create --print-join-command
1
2
$ kubeadm join 192.168.99.120:6443 --token fx1mw7.lj9dgimtk17160zf \
--discovery-token-ca-cert-hash sha256:a8aded3d1e549fa3f81ce5ec819b3cc2c8242cfa67f9e48f39e37ad5ce5de6b0
1
2
3
4
5
6
7
# 测试
$ kubectl create deployment nginx --image=nginx
$ kubectl expose deployment nginx --port=80 --type=NodePort
$ kubectl get pod,svc
$ kubectl scale deployment nginx --replicas=3
常用命令:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 查看节点状态
$ kubectl get nodes
# 从集群中删除节点
$ kubectl delete nodes <node-name>
# 查看节点状详细信息
$ kubectl describe node <node-name>
# 查看集群信息
$ kubectl cluster-info
$ kubectl version
$ kubectl api-versions
# 查看系统 pods
$ kubectl get pods -n kube-system
7) Dashboard 安装
准备好 yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# Copyright 2017 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ------------------- Dashboard Secret ------------------- #
apiVersion: v1
kind: Secret
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard-certs
namespace: kube-system
type: Opaque
---
# ------------------- Dashboard Service Account ------------------- #
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
---
# ------------------- Dashboard Role & Role Binding ------------------- #
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: kubernetes-dashboard-minimal
namespace: kube-system
rules:
# Allow Dashboard to create 'kubernetes-dashboard-key-holder' secret.
- apiGroups: [""]
resources: ["secrets"]
verbs: ["create"]
# Allow Dashboard to create 'kubernetes-dashboard-settings' config map.
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["create"]
# Allow Dashboard to get, update and delete Dashboard exclusive secrets.
- apiGroups: [""]
resources: ["secrets"]
resourceNames: ["kubernetes-dashboard-key-holder", "kubernetes-dashboard-certs"]
verbs: ["get", "update", "delete"]
# Allow Dashboard to get and update 'kubernetes-dashboard-settings' config map.
- apiGroups: [""]
resources: ["configmaps"]
resourceNames: ["kubernetes-dashboard-settings"]
verbs: ["get", "update"]
# Allow Dashboard to get metrics from heapster.
- apiGroups: [""]
resources: ["services"]
resourceNames: ["heapster"]
verbs: ["proxy"]
- apiGroups: [""]
resources: ["services/proxy"]
resourceNames: ["heapster", "http:heapster:", "https:heapster:"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: kubernetes-dashboard-minimal
namespace: kube-system
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: kubernetes-dashboard-minimal
subjects:
- kind: ServiceAccount
name: kubernetes-dashboard
namespace: kube-system
---
# ------------------- Dashboard Deployment ------------------- #
kind: Deployment
apiVersion: apps/v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: kubernetes-dashboard
template:
metadata:
labels:
k8s-app: kubernetes-dashboard
spec:
containers:
- name: kubernetes-dashboard
image: lizhenliang/kubernetes-dashboard-amd64:v1.10.1
ports:
- containerPort: 8443
protocol: TCP
args:
- --auto-generate-certificates
# Uncomment the following line to manually specify Kubernetes API server Host
# If not specified, Dashboard will attempt to auto discover the API server and connect
# to it. Uncomment only if the default does not work.
# - --apiserver-host=http://my-address:port
volumeMounts:
- name: kubernetes-dashboard-certs
mountPath: /certs
# Create on-disk volume to store exec logs
- mountPath: /tmp
name: tmp-volume
livenessProbe:
httpGet:
scheme: HTTPS
path: /
port: 8443
initialDelaySeconds: 30
timeoutSeconds: 30
volumes:
- name: kubernetes-dashboard-certs
secret:
secretName: kubernetes-dashboard-certs
- name: tmp-volume
emptyDir: {}
serviceAccountName: kubernetes-dashboard
# Comment the following tolerations if Dashboard must not be deployed on master
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
---
# ------------------- Dashboard Service ------------------- #
kind: Service
apiVersion: v1
metadata:
labels:
k8s-app: kubernetes-dashboard
name: kubernetes-dashboard
namespace: kube-system
spec:
type: NodePort
ports:
- port: 443
targetPort: 8443
nodePort: 30001
selector:
k8s-app: kubernetes-dashboard
访问 https://nodeIP:30001
如果不让访问,输入 thisisunsafe
创建service account并绑定默认cluster-admin管理员集群角色:
1
2
3
4
5
$ kubectl create serviceaccount dashboard-admin -n kube-system
$ kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin
$ kubectl describe secrets -n kube-system $(kubectl -n kube-system get secret | awk '/dashboard-admin/{print $1}')
登录密码会展示在最后:
1
eyJhbGciOiJSUzI1NiIsImtpZCI6Ilp6bVlPanhONjl1UkhJRWpMdlVzNWQ0bEV2d2FIQm40c1RBcHFsWE5SUXMifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4temp0aHgiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiODk5NGJiYmQtZWZiYi00YjE0LWFkMjQtOWRiZTdiYTU3NDQ0Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.wR45kBOcyq0bDNwWVbngT77K9n_OW-8zzSsArzlNd4WqNpQPFd0ukIkFxhQXV14eQvJefBtqotIT1hDcuIHUghbwZAga-3ISE5cNyg0A3H40Gj69g3wk7BwmkTADLPszrm0M1wYwI-pIj8xl9C5ymcZgyH1xDkEeGQJaWLfFYV2-EbwkNic8iuoZeP5l4q0LeRmi-Zpv1T5MKJrDPDvEXz4X3ZesVxHe4f7E1czgjIbaAPhkbkceiQjmLvB4zotr5JaCx7Fd7u7xSICotevTgUzrMa611cHSFgC3tz2Zwi9N-nQ51Ol9mCg49zkTLIwbr06OQnHvloLiHYffsL4-bg
8) 其他配置
配置命令补全:
1
2
3
$ yum install -y bash-completion
$ echo "source <(kubectl completion bash)" >> /etc/profile
$ source /etc/profile
设置监控:
-
Heapster
该项目已退休,应使用 metrics-server;
-
metrics-server
1 2
# 当前使用的 metrics-server 版本为 0.4.2 $ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
集群切换:
修改./kube/config
文件
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: (略)
server: https://192.168.99.103:6443
name: kubernetes1
# 新增 cluster
- cluster:
certificate-authority-data: (略)
server: https://192.168.99.104:6443
name: kubernetes2
contexts:
- context:
cluster: kubernetes1
user: kubernetes-admin1
name: kubernetes-admin1@kubernetes
# 新增 context
- context:
cluster: kubernetes2
user: kubernetes-admin2
name: kubernetes-admin2@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin1
user:
client-certificate-data:
client-key-data:
- name: kubernetes-admin2
user:
client-certificate-data:
client-key-data:
1
2
3
$ kubectl config get-contexts
# 切换命令
$ kubectl config use-context <context-name>
命名空间:
1
2
3
4
5
6
# 查看所有的命名空间
$ kubectl get ns
# 创建新的命名空间
$ kubectl create ns <namespace-name>
# 切换命名空间(原生不支持,需要借助第三方工具:kubens)
2 问题记录
1 节点加入超时
问题描述:
使用 kubeadm join
命令时,出现下面的异常:
1
2
3
4
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher
问题分析及解决:
命令加入--v=6
参数重试,出现下面的异常:
1
2
3
4
5
6
7
8
9
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
I1021 13:29:27.684395 7236 loader.go:375] Config loaded from file: /etc/kubernetes/kubelet.conf
I1021 13:29:27.702881 7236 cert_rotation.go:137] Starting client certificate rotation controller
I1021 13:29:27.703055 7236 loader.go:375] Config loaded from file: /etc/kubernetes/kubelet.conf
I1021 13:29:27.706231 7236 kubelet.go:194] [kubelet-start] preserving the crisocket information for the node
I1021 13:29:27.706268 7236 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "vm104" as an annotation
I1021 13:29:28.213944 7236 round_trippers.go:443] GET https://192.168.99.103:6443/api/v1/nodes/vm104?timeout=10s 401 Unauthorized in 6 milliseconds
I1021 13:29:28.708158 7236 round_trippers.go:443] GET https://192.168.99.103:6443/api/v1/nodes/vm104?timeout=10s 401 Unauthorized in 1 milliseconds
I1021 13:29:29.208988 7236 round_trippers.go:443] GET https://192.168.99.103:6443/api/v1/nodes/vm104?timeout=10s 401 Unauthorized in 2 milliseconds
因为 master 初始过多次,该节点也加入过多次,401 Unauthorized
应该是之前的信息未删除,执行下面的命令后,重新加入即可:
1
$ kubeadm reset
2 metrics-server 安装后不可用
问题描述:
metrics-server 版本为 0.4.1;
使用kubectl top nodes
出现下面的错误信息:
1
2
$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
使用kubectl get pods -n kube-system
查看pod状态:
1
2
NAME READY STATUS RESTARTS AGE
metrics-server-866b7d5b74-wc86x 0/1 CrashLoopBackOff 7 9m25s
使用kubectl describe metrics-server-866b7d5b74-wc86x -n kube-system
查看 metrics-server pod 详细的描述(选取主要的异常信息):
1
2
3
4
5
6
7
8
9
10
11
12
13
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned kube-system/metrics-server-866b7d5b74-wc86x to vm105
Normal Pulling 11m kubelet, vm105 Pulling image "k8s.gcr.io/metrics-server/metrics-server:v0.4.1"
Normal Pulled 11m kubelet, vm105 Successfully pulled image "k8s.gcr.io/metrics-server/metrics-server:v0.4.1"
Normal Created 10m (x3 over 11m) kubelet, vm105 Created container metrics-server
Normal Started 10m (x3 over 11m) kubelet, vm105 Started container metrics-server
Warning Unhealthy 10m (x6 over 10m) kubelet, vm105 Liveness probe failed: HTTP probe failed with statuscode: 500
Normal Killing 10m (x2 over 10m) kubelet, vm105 Container metrics-server failed liveness probe, will be restarted
Normal Pulled 10m (x2 over 10m) kubelet, vm105 Container image "k8s.gcr.io/metrics-server/metrics-server:v0.4.1" already present on machine
Warning Unhealthy 9m59s (x7 over 10m) kubelet, vm105 Readiness probe failed: HTTP probe failed with statuscode: 500
Warning BackOff 69s (x33 over 8m38s) kubelet, vm105 Back-off restarting failed container
问题分析及解决:
1
2
# 删除deployment
$ kubectl delete deployment metrics-server -n kube-system
修改配置:
在 spec.template.spec.containers.args
下,增加- --kubelet-insecure-tls
参数;
3 metrics-server 正常,但 kubectl top 无法查看资源信息
问题描述:
1
2
3
4
5
6
$ kubectl top node
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
$ get pods -n kube-system
NAME READY STATUS RESTARTS AGE
metrics-server-84fc898bf-6g6tl 1/1 Running 0 24m
问题分析及解决:
在kube-apiserver.yaml
的 spec.containers.command 下新增--enable-aggregator-routing=true
参数,然后重启 kubelet:
1
2
3
4
5
# 新增参数 --enable-aggregator-routing=true
$ vi /etc/kubernetes/manifests/kube-apiserver.yaml
# 重启
$ systemctl restart kubelet
4 metrics-server 正常,仍然无法查看资源信息:
1
2
3
4
5
6
7
8
9
$ kubectl describe apiservice v1beta1.metrics.k8s.io
Status:
Conditions:
Last Transition Time: 2021-02-22T09:16:47Z
Message: failing or missing response from https://10.244.1.58:4443/apis/metrics.k8s.io/v1beta1: Get https://10.244.1.58:4443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Reason: FailedDiscoveryCheck
Status: False
Type: Available
Events: <none>
解决方案:
在spec.template.spec
下新增hostNetwork: true
,然后重启 metrics-server 服务即可。
5 failed to get cgroup stats for “/system.slice/kubelet.service”
问题描述:
1
2
$ systemctl status kubelet.service
Dec 01 14:19:13 localhost.localdomain kubelet[17633]: E1201 14:19:13.827700 17633 summary_sys_containers.go:47] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"
问题分析及解决:
kubernetes和docker版本兼容性问题;
1
2
3
4
# 编辑添加参数 --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
$ vi /var/lib/kubelet/kubeadm-flags.env
$ systemctl restart kubelet
5 从节点中无法使用 kubectl 命令
问题描述:
1
2
$ kubectl get modes
The connection to the server localhost:8080 was refused - did you specify the right host or port?
问题分析及解决:
kubectl命令需要使用kubernetes-admin来运行;
1
2
3
4
5
# 1. 主节点:将 admin.conf 拷贝到从节点
$ scp /etc/kubernetes/admin.conf root@192.168.99.104:/etc/kubernetes/admin.conf
# 2. 从节点:
$ echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
$ source ~/.bash_profile
2.2.2 yum 安装组件
2.2.3 源码
2.3 Pod 创建
2.3.1 创建Pod
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: v1
kind: Pod
metadata:
name: static-web
labels:
# 可自定义标签
custom-role: myrole
spec:
containers:
- name: web
image: nginx
ports:
- name: web
containerPort: 80
protocol: TCP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 查看支持创建的 apiVersion
$ kubectl api-versions
# 两种方式创建 pod;
# 区别:apply 方式可以更新配置;
$ kubectl create -f nginx-pod.yaml
$ kubectl apply -f nginx-pod.yaml
# 删除 pod
$ kubectl delete -f nginx-pod.yaml
# 进入Pod中默认容器(当有多个容器时, 进入第一个)
$ kubectl exec -it static-web -- sh
# 进入Pod中web容器
$ kubectl exec -it static-web -c web -- sh
获取 yaml 的方式:
1
$ kubectl run nginx --image=nginx --dry-run=client -o yaml
2.3.2 Pod 里运行命令及生命周期
1
2
3
4
5
6
7
8
9
10
11
apiVersion: v1
kind: Pod
metadata:
name: myapp
labels:
custom-lable: busybox
spec:
containers:
- name: app
image: busybox
command: ['sh', '-c', 'echo OK && sleep 60']
command 其他写法:
1
2
3
4
5
6
7
8
spec:
containers:
- name: app
image: busybox
command:
- sh
- -c
- echo OK && sleep 60
1
2
3
4
5
6
7
8
spec:
containers:
- name: app
image: busybox
args:
- sh
- -c
- echo OK && sleep 60
2.3.3 镜像的下载策略
- Always:每次都下载新的镜像;
- Nerver:只使用本地镜像,从不下载;
- IfNotPresent:本地没有才下载;
1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: nginx
name: nginx
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
2.3.4 环境变量
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: demo
name: demo-busybox
spec:
containers:
- name: app
image: busybox
env:
- name: ENV_1
value: "hello!"
command: ['/bin/echo']
args: ["$(ENV_1)"]
2.3.5 重启策略
- Always:总是重启;
- OnFailure:失败了重启;
- Nerver:从不重启;
1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
run: nginx
name: nginx
spec:
containers:
- image: nginx
name: nginx
restartPolicy: Always
2.3.6 容器状态
- Pending:Pod 已经建立,但是 Pod 里的容器还没有创建完成;
- Running:Pod 已经被调度到节点上,且容器工作正常;
- Completed:Pod 里所有容器正常退出;
- Failed:
若一个Pod中,有一个容器 Failed,Pod 状态就是 Error;有一个 Completed,另一个 Running,则 Pod 状态为 Running。
2.3.7 初始化容器
-
可以设置多个初始化容器,按顺序启动,启动完成后,才会启动主容器;
- 如果初始化容器失败,则会一直重启,Pod 不会创建;
- init 容器支持应用容器的全部字段和特性,但不支持 ReadinessProbe,因为他们必须在Pod就绪前运行完成;
- 在 Pod 上使用 activeDeadlineSeconds,在容器上使用 livenessProbe,这样能避免 init 容器一直失败。这就为 init 容器活跃设置了一个期限;
- 在 Pod 中,每个容器的名称必须唯一;
- 对 init 容器 spec 的修改,仅限于 image 字段;
更改 init 容器的 image 字段,不会重启改 Pod
。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: v1
kind: Pod
metadata:
labels:
run: nginx
name: nginx
spec:
volumes:
- name: workdir
emptyDir: {}
containers:
- image: nginx
name: nginx
volumeMounts:
- name: workdir
mountPath: '/app/config'
initContainers:
- image: busybox
name: busybox
command: ['sh', '-c', 'touch /opt/config.yml']
volumeMounts:
- name: workdir
mountPath: '/opt'
2.3.8 静态Pod
所谓静态 Pod,就是不是 master 上创建的,需要到 node 的 /etc/kubelet.d/
里创建一个 yaml 文件,根据该文件创建 Pod,且该 Pod 不接收 master 的管理。
创建过程:
-
在 node 执行上
systemctl status kubectl -l
,找到启动配置目录/usr/lib/systemd/system/kubelet.service.d
; -
编辑文件,新增
--pod-manifest-path
启动参数:1 2 3 4 5 6 7 8 9 10 11
# Note: This dropin only works with kubeadm and kubelet v1.11+ [Service] Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --pod-manifest-path=/etc/kubernetes/static-yaml" Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml" # This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env # This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use # the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file. EnvironmentFile=-/etc/sysconfig/kubelet ExecStart= ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
小提示:
这里的
/etc/kubernetes/static-yaml
是自定义的路径,可修改; -
在自定义目录``/etc/kubernetes/static-yaml`下,创建 yaml;
-
重启 kubelet;
1 2
$ systemctl daemon-reload $ systemctl restart kubelet
删除过程:
- 删除自定义目录``/etc/kubernetes/static-yaml`下创建的 yaml 即可;
- (非必须)重启kubelet;
2.4 Pod 调度
2.4.1 调度的三个对象
待调度Pod
可用节点
调度算法
- 主机过滤
- 主机打分
- LeastRequestPriority
- BalanceResourceAllocation
- CalculateSpreadPriority
调度策略
2.4.2 手动指定Pod运行位置
-
为节点指定标签
1 2 3 4 5 6 7 8
# 查看节点标签 $ kubectl get nodes --show-labels # 选择器 $ kubectl get nodes --selector=kubernetes.io/hostname=vm104 # 节点新增标签 $ kubectl label node vm104 disktype=ssd # 删除节点标签 $ kubectl label node vm104 disktype-
-
指定Pod运行在指定节点
1 2 3 4 5 6 7 8 9 10 11 12
apiVersion: v1 kind: Pod metadata: labels: run: nginx name: nginx spec: nodeSelector: disktype: ssd containers: - image: nginx name: nginx
2.4.3 节点维护
-
警戒线 Cordon
1 2
$ kubectl cordon vm104 $ kubectl uncordon vm104
当设置了警戒线后,已经运行在这个节点上的Pod不会移动,需要删除后让其重新调度;
当指定 Pod 到 Cordon 的节点时,Pod 会一直处于 Pending 状态。
-
Drain
Drain 包含 Cordon 和已有 Pod 驱逐。
1
$ kubectl drain vm104 --ignore-daemonsets
-
节点 taint 及 Pod 的 tolerations
1 2 3 4
# 设置 taint $ kubectl taint nodes vm104 dedicated=special-user:NoSchedule # 删除 $ $ kubectl taint nodes vm104 dedicated-
如果要运行在 taint 的节点上:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
apiVersion: v1 kind: Pod metadata: labels: run: nginx name: nginx spec: tolerations: key: 'dedicated' value: 'special-user' effect: 'NoSchedule' operator: 'Equal' containers: - image: nginx name: nginx
2.5 存储管理
2.5.1 存储类型
-
本地存储
-
emptyDir
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
apiVersion: v1 kind: Pod metadata: labels: run: busybox name: busybox spec: volumes: - name: workdir emptyDir: {} containers: - image: busybox name: busybox command: ['sh', '-c', 'sleep 5000'] volumeMounts: - name: workdir mountPath: '/app' - image: busybox name: busybox command: ['sh', '-c', 'sleep 5000'] volumeMounts: - name: workdir mountPath: '/opt'
-
hostPath
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
apiVersion: v1 kind: Pod metadata: name: test-pd spec: containers: - image: busybox name: test-container volumeMounts: - mountPath: /test-pd name: test-volume volumes: - name: test-volume hostPath: # directory location on host path: /data # this field is optional type: Directory
-
-
网络存储
-
NFS
NAS 存储(客户端多时,会产生瓶颈)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
apiVersion: v1 kind: Pod metadata: name: test-pd spec: containers: - image: busybox name: test-container volumeMounts: - mountPath: '/test-pd' name: nfs volumes: - name: nfs nfs: path: '/data' server: 1.2.3.4
-
iscsi
属于 ip-SAN 存储
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
apiVersion: v1 kind: Pod metadata: name: iscsipd spec: containers: - name: iscsipd-rw image: kubernetes/pause volumeMounts: - mountPath: "/mnt/iscsipd" name: iscsipd-rw volumes: - name: iscsipd-rw iscsi: targetPortal: 10.0.2.15:3260 portals: ['10.0.2.16:3260', '10.0.2.17:3260'] iqn: iqn.2001-04.com.example:storage.kube.sys1.xyz lun: 0 fsType: ext4 readOnly: true
-
ceph
-
gluster
-
…
-
-
持久性存储
docker-compose 创建nfs:
1 2 3 4 5 6 7 8 9 10 11 12 13
version: '3' services: nfs_01: image: itsthenetwork/nfs-server-alpine ports: - 2049:2049 cap_add: - SYS_ADMIN environment: PERMITTED: '*' SHARED_DIRECTORY: '/opt/share/pv01' volumes: - /opt/docker-nfs:/opt
pv 创建后,是全局可见的。
属于静态供应,创建PV后,供用户消费:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv01
spec:
capacity:
storage: 10Mi
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Recycle
storageClassName: slow
nfs:
path: /opt/share/pv01
server: 192.168.99.106
pvc 创建(基于命名空间):
1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc01
spec:
accessModes:
- ReadWriteOnce
volumeMode: Filesystem
resources:
requests:
storage: 10Mi
storageClassName: slow
申请PV卷:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: myfrontend
image: nginx
volumeMounts:
- mountPath: "/var/www/html"
name: mypd
volumes:
- name: mypd
persistentVolumeClaim:
claimName: pvc01
2.6 密码管理
2.6.1 secrect 保管密码
1 创建方式
1
2
3
4
5
6
7
8
9
10
11
12
# 方式一
$ kubectl create secret generic my-secret --from-literal=key1=supersecret --from-literal=key2=topsecret
$ kubectl get secret my-secret
$ echo '密文' | base64 -d
# 方式二
$ echo -n jiangjiang > passphrase
$ kubectl create secret generic my-secret --from-file=ssh-privatekey=path/to/id_rsa --from-literal=passphrase=./passphrase
# 方式三
$ kubectl create secret generic my-secret --from-env-file=path/to/bar.env
yaml 方式创建:
1
2
3
4
5
6
7
8
apiVersion: v1
kind: Secret
metadata:
name: mysecret
type: Opaque
data:
USER_NAME: YWRtaW4=
PASSWORD: MWYyZDFlMmU2N2Rm
2 引用方式
-
存储卷引用
以卷的方式引用,
支持动态更新密码
,且密码以明文文件的方式读取。1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
apiVersion: v1 kind: Pod metadata: name: secret-test-pod labels: name: secret-test spec: volumes: - name: secret-volume secret: secretName: ssh-key-secret #items: #- key: ssh-publickey # path: . containers: - name: ssh-test-container image: mySshImage volumeMounts: - name: secret-volume readOnly: true mountPath: "/etc/secret-volume"
容器中的命令运行时,密钥的片段可以在以下目录找到:
1 2
/etc/secret-volume/ssh-publickey /etc/secret-volume/ssh-privatekey
-
变量引用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
apiVersion: v1 kind: Pod metadata: name: secret-env-pod spec: containers: - name: mycontainer image: redis env: - name: SECRET_USERNAME valueFrom: secretKeyRef: name: mysecret key: username - name: SECRET_PASSWORD valueFrom: secretKeyRef: name: mysecret key: password
2.6.2 configmap 保存密码
1 创建方式
创建后的配置是明文
。
1
2
# 三种创建方式类似 secret
$ kubectl create configmap my-config --from-literal=key1=config1 --from-literal=key2=config2
yaml 创建方式:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: v1
kind: ConfigMap
metadata:
name: game-demo
data:
# 类属性键;每一个键都映射到一个简单的值
player_initial_lives: "3"
ui_properties_file_name: "user-interface.properties"
# 类文件键
game.properties: |
enemy.types=aliens,monsters
player.maximum-lives=5
user-interface.properties: |
color.good=purple
color.bad=yellow
allow.textmode=true
2 引用方式
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
apiVersion: v1
kind: Pod
metadata:
name: configmap-demo-pod
spec:
containers:
- name: demo
image: alpine
command: ["sleep", "3600"]
env:
# 定义环境变量
- name: PLAYER_INITIAL_LIVES # 请注意这里和 ConfigMap 中的键名是不一样的
valueFrom:
configMapKeyRef:
name: game-demo # 这个值来自 ConfigMap
key: player_initial_lives # 需要取值的键
- name: UI_PROPERTIES_FILE_NAME
valueFrom:
configMapKeyRef:
name: game-demo
key: ui_properties_file_name
volumeMounts:
- name: config
mountPath: "/config"
readOnly: true
volumes:
# 你可以在 Pod 级别设置卷,然后将其挂载到 Pod 内的容器中
- name: config
configMap:
# 提供你想要挂载的 ConfigMap 的名字
name: game-demo
# 来自 ConfigMap 的一组键,将被创建为文件
items:
- key: "game.properties"
path: "game.properties"
- key: "user-interface.properties"
path: "user-interface.properties"
2.7 Deployment
2.7.1 作用
ReplicationController
ReplicaSets
2.7.2 创建 Deployment
1 命令行创建
1
$ kubectl create deployment my-nginx --image=nginx
2 yaml 文件创建
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: my-nginx
name: my-nginx
spec:
replicas: 2
selector:
matchLabels:
app: my-nginx
template:
metadata:
labels:
app: my-nginx
spec:
containers:
- image: nginx
name: nginx-name
2.7.3 修改副本数
1
2
3
4
5
6
7
# 方式一
$ kubectl scale deployment nginx --replicas=20
# 方式二
$ kubectl edit deployment nginx
# 方式三:更新 deployment yaml 并 apply
2.7.4 滚动更新
1
2
3
4
5
6
7
8
9
10
11
12
# 修改 my-nginx deployment 中,容器名为 nginx-container-name 的 nginx 镜像为 1.9
$ kubectl set image deployment/my-nginx nginx-container-name=nginx:1.9
# 查看 deployment 镜像版本
$ kubectl get deployment my-nginx -o wide
# 回滚
$ kubectl rollout undo deployment/my-nginx --record
# 查看历史版本
$ kubectl rollout history deployment/my-nginx
# 切换到指定版本
$ kubectl rollout undo deployment/my-nginx --to-revision=2
指定最大不可用:默认 25%,可以指定为个数;
指定最大浪涌:默认 25%,可以指定为个数;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: nginx
name: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: nginx
spec:
containers:
- image: nginx
name: nginx
2.7.5 水平自动伸缩(HPA, Horizontal Pod Autoscaler)
通过检测pod CPU的负载,解决deployment里某pod负 载太重,动态伸缩pod的数量来负载均衡。
1
2
3
4
5
6
7
8
$ kubectl autoscale deployment my-nginx --min=2 --max=10
$ kubectl autoscale deployment my-nginx --max=5 --cpu-percent=80
# 查看 HPA
$ kubectl get hpa
# 删除
$ kubectl delete hpa my-nginx
解决当前cpu的使用量为unknown:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 修改 resources
$ kubectl edit deployment my-nginx
containers:
- image: nginx:1.7.9
imagePullPolicy: Always
name: nginx
resources:
requests:
cpu: 100m
# 修改 kube-controller-manager.yaml
$ vi /etc/kubernetes/manifests/kube-controller-manager.yaml
- command:
- kube-controller-manager
- --horizontal-pod-autoscaler-use-rest-clients=true
- --horizontal-pod-autoscaler-sync-period=10s
HPA 测试:
1
2
3
4
# 进入容器
$ kubectl exec -it my-nginx-665bc6f67f-5jd8m -- sh
> cat /dev/zero /dev/null &
2.8 其他的控制器
2.9 健康检查
一种探测(probe)机制,探测到不正常则重启;
liveness:存活检测,如果有问题,直接重启;
readiness:就绪检测,如果有问题,不加入服务;
startup:启动检测,检测失败,直接重启(1.16的新功能,1.18 beta)
三种检测方式:command/httpGET/TCP
注:对每一种 Probe,只会执行一种检测;如果定义了多个 Handler,按 Exec、HTTPGet、TCPSocket 的优先级选择。
1 Exec Handler
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 30
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5 #容器启动的5s内不监测
periodSeconds: 5 #每5s钟检测一次
2 HTTPGet Handler
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: nginx
livenessProbe:
failureThreshold: 3
httpGet:
path: /index.html
port: 80
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
3 TCPSocket Handler
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: nginx
livenessProbe:
failureThreshold: 3
tcpSocket:
port: 80
initialDelaySeconds: 10
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 10
initialDelaySeconds:容器启动后第一次执行探测是需要等待多少秒。
periodSeconds:执行探测的频率,默认是10秒,最小1秒。
timeoutSeconds:探测超时时间,默认1秒,最小1秒。
successThreshold:探测失败后,最少连续探测成功多少次才被认定为成功,默认是1,对于liveness必须
是1,最小值是1。
failureThreshold:探测成功后,最少连续探测失败多少次才被认定为失败。默认是3。最小值是1。
2.10 JOB
1 JOB
适用于一次计算任务;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: batch/v1
kind: Job
metadata:
name: job1
spec:
backoffLimit: 6
completions: 1
parallelism: 1
template:
metadata:
name: pi
spec:
containers:
- name: hello
image: busybox
command: ["echo","hello world!"]
restartPolicy: Never
job的restart策略只能是:
Nerver:只要任务没有完成,则是新创建pod运行,直到job完成,会产生多个pod;
OnFailure:只要pod没有完成,则会重启pod,直到job完成;
parallelism: 1 一次性运行pod的个数
completions: 1 有一个pod运行成功
backoffLimit: 6 失败pod的极限
2 CronJob
启用cronjob需要添加- --runtime-config=batch/v2alpha1=true
1
2
3
4
5
6
7
# 添加 - --runtime-config=batch/v2alpha1=tru
$ vim /etc/kubernetes/manifests/kube-apiserver.yaml
# 重启服务
$ systemctl restart kubelet.service
# 验证
$ kubectl api-versions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: batch/v2alpha1
kind: CronJob
metadata:
name: job2
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
command: ["echo","hello world!"]
restartPolicy: OnFailure
2.11 Service
1 创建 Service
1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: v1
kind: Service
metadata:
labels:
name: test
name: svc1
spec:
ports:
- port: 80
targetPort: 80
selector:
app: my-nginx
type: NodePort
2 服务发现
通过变量的方式发现
1
2
3
4
5
6
7
8
9
env:
- name: WORDPRESS_DB_USER
value: root
- name: WORDPRESS_DB_PASSWORD
value: redhat
- name: WORDPRESS_DB_NAME
value: blog
- name: WORDPRESS_DB_HOST
value: $(MYSQL_SERVICE_HOST)
- 只能获取相同 namespace 里的变量;
- 变量的获取有先后顺序,引用的变量必须先创建。
通过 DNS 的方式发现
在 kube-system 里有 DNS,可以自动发现所有命名空间里的服务的 clusterIP,所以在同一个命名空间里,一个服务访问另外一个服务的时候,可以直接通过服务名来访问。
只要创建了一个服务,都会自动向 kube-system 里的 DNS 注册。
如果是不同的命名空间,可以通过服务名.命名空间名
来访问。
3 服务的发布
所谓发布指的是,如何让集群之外的主机能访问服务。
- NodePort
- LoadBalancer
- ExternalName
- ClusterIP
- ingress
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: myingress
spec:
rules:
- host: www.rhce.cc
http:
paths:
- path: /
backend:
serviceName: nginx2
servicePort: 80
- path: /rhce
backend:
serviceName: nginx2
servicePort: 80