CKA

Posted by jiangydev on June 9, 2019

[TOC]

1 Docker 基础准备

1.1 SELinux

查看 SELinux 模式:getenforce

SELinux 的三种模式:

  • enforcing:强制模式,代表 SELinux 运作中,且已经正确的开始限制 domain/type 了;

  • permissive:宽容模式:代表 SELinux 运作中,不过仅会有警告讯息并不会实际限制 domain/type 的存取。这种模式可以运来作为 SELinux 的 debug 之用;

  • disabled:关闭,SELinux 并没有实际运作

修改 SELinux 模式(修改后需要重启)

1
2
3
vi /etc/selinux/config 
SELINUX=enforcing <==调整 enforcing|disabled|permissive 
SELINUXTYPE=targeted
1
sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

2 Kubernetes

2.1 K8s 框架

  • kubectl: 客户端命令行工具,将接受的命令格式话后发送给 kube-apiserver,作为整个系统的操作入口。
  • kube-apiserver: 作为整个系统的控制入口,以 REST API 服务提供接口。
  • kube-controller-manager: 用来执行整个系统中的后台任务,包括节点状态状况、Pod个数、Pods和Service的关联等。
  • kube-scheduler: 负责节点资源管理,接收来自 kube-apiserver 创建 Pods 任务,并分配到某个节点。
  • etcd: 负责节点间的服务发现和配置共享。
  • kubelet: 运行在每个计算节点上,作为 agent,接收分配该节点的 Pods 任务及管理容器,周期性获取容器状态,反馈给kube-apiserver。
  • kube-proxy: 运行在每个计算节点上,负责Pod网络代理。定时从 etcd 获取到 service 信息来做相应的策略。

K8s 如何实现跨主机通信?

CNI(Container Network Interface) 插件,如 flannel,calico

2.2 K8s 安装

2.2.1 kubeadmin 安装方式

1 安装步骤

安装要求:

  • 一台或多台机器,操作系统 CentOS7.x-86_x64
  • 硬件配置:2GB或更多RAM,2个CPU或更多CPU,硬盘30GB或更多
  • 集群中所有机器之间网络互通
  • 可以访问外网,需要拉取镜像
  • 禁止swap分区

master 和 node 上执行:

1) 系统配置

配置系统:

1
$ cat /etc/redhat-release

关闭防火墙

1
2
3
$ systemctl stop firewalld.service
# 禁止防火墙开机自启
$ systemctl disable firewalld.service

关闭 selinux

1
2
3
4
# 修改后需要重启才能生效
$ sed -i 's/SELLINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config

$ setenforce 0

配置 /etc/hosts

1
$ vi /etc/hosts

关闭 swap

注:

Q:什么是 swap 分区?为什么要禁用 swap 分区?

A:swap 分区是将一部分磁盘空间用作内存使用,可以暂时解决内存不足的问题,Linux 系统默认会开启 swap 分区。

个人观点:swap 分区虽然能解决内存暂时不足的问题,但是与磁盘交互 IO 会影响应用程序的性能和稳定性,也不是长久之计。若考虑服务质量,服务提供商应该禁用 swap 分区。客户在内存资源不够时,可以临时申请更大的内存。

目前 K8s 版本是不支持 swap 的,经过漫长的讨论,最终 K8s 社区确实打算支持 swap,但还是实验版。

K8s 社区对开启 swap 功能的讨论:https://github.com/kubernetes/kubernetes/issues/53533

1
2
3
$ swapoff -a
# 注释自启动中的swap
$ sed -ri 's/.*swap.*/#&/' /etc/fstab  #永久关闭
1
$ hostnamectl set-hostname 名字

启用 netfilter 和内核 IP 转发(路由):

注:kube-proxy 需要启用 net.bridge.bridge-nf-call-iptables

1
2
3
4
5
6
# 将桥接的IPv4流量传递到iptables的链
$ cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
$ sysctl --system
2) 安装 Docker
1
2
3
$ yum install -y wget && wget https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo -O /etc/yum.repos.d/docker-ce.repo

$ yum -y install docker-ce-18.06.1.ce-3.el7

配置镜像加速器

1
2
3
4
5
6
7
8
$ mkdir -p /etc/docker
$ cat > /etc/docker/daemon.json << EOF
{
  "registry-mirrors": ["https://o7zhcmyv.mirror.aliyuncs.com"]
}
EOF
$ systemctl daemon-reload
$ systemctl restart docker

设置开机自启

1
$ systemctl enable docker
3) 安装 Kube 组件

配置 yum 源

1
2
3
# 删除原有的源
$ rm -rf /etc/yum.repos.d/*
# 新增如下三个aliyun源
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# CentOS-Base.repo
#
# The mirror system uses the connecting IP address of the client and the
# update status of each mirror to pick mirrors that are updated to and
# geographically close to the client.  You should use this for CentOS updates
# unless you are manually picking other mirrors.
#
# If the mirrorlist= does not work for you, as a fall back you can try the 
# remarked out baseurl= line instead.
 
[base]
name=CentOS-$releasever - Base - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/$releasever/os/$basearch/
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
gpgcheck=1
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
 
#released updates 
[updates]
name=CentOS-$releasever - Updates - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/$releasever/updates/$basearch/
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates
gpgcheck=1
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
 
#additional packages that may be useful
[extras]
name=CentOS-$releasever - Extras - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/$releasever/extras/$basearch/
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras
gpgcheck=1
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
 
#additional packages that extend functionality of existing packages
[centosplus]
name=CentOS-$releasever - Plus - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/$releasever/centosplus/$basearch/
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=centosplus
gpgcheck=1
enabled=0
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
 
#contrib - packages by Centos Users
[contrib]
name=CentOS-$releasever - Contrib - mirrors.aliyun.com
failovermethod=priority
baseurl=http://mirrors.aliyun.com/centos/$releasever/contrib/$basearch/
#mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=contrib
gpgcheck=1
enabled=0
gpgkey=http://mirrors.aliyun.com/centos/RPM-GPG-KEY-CentOS-7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[epel]
name=Extra Packages for Enterprise Linux 7 - $basearch
baseurl=http://mirrors.aliyun.com/epel/7/$basearch
#mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=$basearch
failovermethod=priority
enabled=1
gpgcheck=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
 
[epel-debuginfo]
name=Extra Packages for Enterprise Linux 7 - $basearch - Debug
baseurl=http://mirrors.aliyun.com/epel/7/$basearch/debug
#mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-debug-7&arch=$basearch
failovermethod=priority
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
gpgcheck=0
 
[epel-source]
name=Extra Packages for Enterprise Linux 7 - $basearch - Source
baseurl=http://mirrors.aliyun.com/epel/7/SRPMS
#mirrorlist=https://mirrors.fedoraproject.org/metalink?repo=epel-source-7&arch=$basearch
failovermethod=priority
enabled=0
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
gpgcheck=0
1
2
3
4
5
6
7
8
9
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

安装kubernetes相关软件包

1
2
3
4
5
$ yum install -y kubelet-1.18.6 kubeadm-1.18.6 kubectl-1.18.6
$ systemctl restart kubelet && systemctl enable kubelet

$ systemctl is-active kubelet
# 此时 kubelet 状态是:activating
4) 部署 Kubernetes Master

在master上执行

1
2
3
4
5
$ kubeadm init \
  --node-name=vm103 \
  --apiserver-advertise-address=192.168.99.103 \
  --kubernetes-version=v1.18.6 \
  --pod-network-cidr=10.244.0.0/16
1
2
3
4
5
6
7
8
# 国内镜像库
$ kubeadm init \
  --node-name=main \
  --apiserver-advertise-address=192.168.99.120 \
  --image-repository registry.aliyuncs.com/google_containers \
  --kubernetes-version v1.18.6 \
  --service-cidr=10.96.0.0/16 \
  --pod-network-cidr=10.244.0.0/16

安装成功后,根据提示配置 kubectl工具:

1
2
3
$ mkdir -p $HOME/.kube
$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
5) 安装 flannel 网络插件
1
$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

小提示:

如果出现:The connection to the server raw.githubusercontent.com was refused - did you specify the right host or port?

可以先在 https://www.ipaddress.com 查询raw.githubusercontent.com的真实IP,修改 hosts 后重新执行;

1
2
$ sudo vi /etc/hosts
199.232.68.133 raw.githubusercontent.com
6) 子节点加入集群
1
2
3
4
5
6
7
8
9
# 拉取最新的 flannel
$ docker pull quay.io/coreos/flannel:v0.14.0

$ kubeadm join 192.168.99.103:6443 --token lgrcmk.2hck482gsnrn6ykm \
    --discovery-token-ca-cert-hash sha256:0b4dc91d4c73029f654f1f361b87c05818140f09f8b0742d99fc56da47a0dfbf \
    --node-name vm104

# 上面命令重新获取的方式
$ kubeadm token create --print-join-command
1
2
$ kubeadm join 192.168.99.120:6443 --token fx1mw7.lj9dgimtk17160zf \
    --discovery-token-ca-cert-hash sha256:a8aded3d1e549fa3f81ce5ec819b3cc2c8242cfa67f9e48f39e37ad5ce5de6b0
1
2
3
4
5
6
7
# 测试
$ kubectl create deployment nginx --image=nginx
$ kubectl expose deployment nginx --port=80 --type=NodePort

$ kubectl get pod,svc
$ kubectl scale deployment nginx --replicas=3

常用命令:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# 查看节点状态
$ kubectl get nodes

# 从集群中删除节点
$ kubectl delete nodes <node-name>

# 查看节点状详细信息
$ kubectl describe node <node-name>

# 查看集群信息
$ kubectl cluster-info
$ kubectl version
$ kubectl api-versions

# 查看系统 pods
$ kubectl get pods -n kube-system
7) Dashboard 安装

准备好 yaml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# Copyright 2017 The Kubernetes Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# ------------------- Dashboard Secret ------------------- #

apiVersion: v1
kind: Secret
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard-certs
  namespace: kube-system
type: Opaque

---
# ------------------- Dashboard Service Account ------------------- #

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard
  namespace: kube-system

---
# ------------------- Dashboard Role & Role Binding ------------------- #

kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: kubernetes-dashboard-minimal
  namespace: kube-system
rules:
  # Allow Dashboard to create 'kubernetes-dashboard-key-holder' secret.
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["create"]
  # Allow Dashboard to create 'kubernetes-dashboard-settings' config map.
- apiGroups: [""]
  resources: ["configmaps"]
  verbs: ["create"]
  # Allow Dashboard to get, update and delete Dashboard exclusive secrets.
- apiGroups: [""]
  resources: ["secrets"]
  resourceNames: ["kubernetes-dashboard-key-holder", "kubernetes-dashboard-certs"]
  verbs: ["get", "update", "delete"]
  # Allow Dashboard to get and update 'kubernetes-dashboard-settings' config map.
- apiGroups: [""]
  resources: ["configmaps"]
  resourceNames: ["kubernetes-dashboard-settings"]
  verbs: ["get", "update"]
  # Allow Dashboard to get metrics from heapster.
- apiGroups: [""]
  resources: ["services"]
  resourceNames: ["heapster"]
  verbs: ["proxy"]
- apiGroups: [""]
  resources: ["services/proxy"]
  resourceNames: ["heapster", "http:heapster:", "https:heapster:"]
  verbs: ["get"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: kubernetes-dashboard-minimal
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: kubernetes-dashboard-minimal
subjects:
- kind: ServiceAccount
  name: kubernetes-dashboard
  namespace: kube-system

---
# ------------------- Dashboard Deployment ------------------- #

kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard
  namespace: kube-system
spec:
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      k8s-app: kubernetes-dashboard
  template:
    metadata:
      labels:
        k8s-app: kubernetes-dashboard
    spec:
      containers:
      - name: kubernetes-dashboard
        image: lizhenliang/kubernetes-dashboard-amd64:v1.10.1
        ports:
        - containerPort: 8443
          protocol: TCP
        args:
          - --auto-generate-certificates
          # Uncomment the following line to manually specify Kubernetes API server Host
          # If not specified, Dashboard will attempt to auto discover the API server and connect
          # to it. Uncomment only if the default does not work.
          # - --apiserver-host=http://my-address:port
        volumeMounts:
        - name: kubernetes-dashboard-certs
          mountPath: /certs
          # Create on-disk volume to store exec logs
        - mountPath: /tmp
          name: tmp-volume
        livenessProbe:
          httpGet:
            scheme: HTTPS
            path: /
            port: 8443
          initialDelaySeconds: 30
          timeoutSeconds: 30
      volumes:
      - name: kubernetes-dashboard-certs
        secret:
          secretName: kubernetes-dashboard-certs
      - name: tmp-volume
        emptyDir: {}
      serviceAccountName: kubernetes-dashboard
      # Comment the following tolerations if Dashboard must not be deployed on master
      tolerations:
      - key: node-role.kubernetes.io/master
        effect: NoSchedule

---
# ------------------- Dashboard Service ------------------- #

kind: Service
apiVersion: v1
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard
  namespace: kube-system
spec:
  type: NodePort 
  ports:
    - port: 443
      targetPort: 8443
      nodePort: 30001
  selector:
    k8s-app: kubernetes-dashboard

访问 https://nodeIP:30001

如果不让访问,输入 thisisunsafe

创建service account并绑定默认cluster-admin管理员集群角色:

1
2
3
4
5
$ kubectl create serviceaccount dashboard-admin -n kube-system

$ kubectl create clusterrolebinding dashboard-admin --clusterrole=cluster-admin --serviceaccount=kube-system:dashboard-admin

$ kubectl describe secrets -n kube-system $(kubectl -n kube-system get secret | awk '/dashboard-admin/{print $1}')

登录密码会展示在最后:

1
eyJhbGciOiJSUzI1NiIsImtpZCI6Ilp6bVlPanhONjl1UkhJRWpMdlVzNWQ0bEV2d2FIQm40c1RBcHFsWE5SUXMifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4temp0aHgiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiODk5NGJiYmQtZWZiYi00YjE0LWFkMjQtOWRiZTdiYTU3NDQ0Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmRhc2hib2FyZC1hZG1pbiJ9.wR45kBOcyq0bDNwWVbngT77K9n_OW-8zzSsArzlNd4WqNpQPFd0ukIkFxhQXV14eQvJefBtqotIT1hDcuIHUghbwZAga-3ISE5cNyg0A3H40Gj69g3wk7BwmkTADLPszrm0M1wYwI-pIj8xl9C5ymcZgyH1xDkEeGQJaWLfFYV2-EbwkNic8iuoZeP5l4q0LeRmi-Zpv1T5MKJrDPDvEXz4X3ZesVxHe4f7E1czgjIbaAPhkbkceiQjmLvB4zotr5JaCx7Fd7u7xSICotevTgUzrMa611cHSFgC3tz2Zwi9N-nQ51Ol9mCg49zkTLIwbr06OQnHvloLiHYffsL4-bg
8) 其他配置

配置命令补全:

1
2
3
$ yum install -y bash-completion
$ echo "source <(kubectl completion bash)" >> /etc/profile
$ source /etc/profile

设置监控:

  • Heapster

    该项目已退休,应使用 metrics-server;

  • metrics-server

    1
    2
    
    # 当前使用的 metrics-server 版本为 0.4.2
    $ kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
    

集群切换:

修改./kube/config文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: (略)
    server: https://192.168.99.103:6443
  name: kubernetes1
# 新增 cluster
- cluster:
    certificate-authority-data: (略)
    server: https://192.168.99.104:6443
  name: kubernetes2
contexts:
- context:
    cluster: kubernetes1
    user: kubernetes-admin1
  name: kubernetes-admin1@kubernetes
# 新增 context
- context:
    cluster: kubernetes2
    user: kubernetes-admin2
  name: kubernetes-admin2@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin1
  user:
    client-certificate-data:
    client-key-data:
- name: kubernetes-admin2
  user:
    client-certificate-data:
    client-key-data:
1
2
3
$ kubectl config get-contexts
# 切换命令
$ kubectl config use-context <context-name>

命名空间:

1
2
3
4
5
6
# 查看所有的命名空间
$ kubectl get ns
# 创建新的命名空间
$ kubectl create ns <namespace-name>

# 切换命名空间(原生不支持,需要借助第三方工具:kubens)

2 问题记录

1 节点加入超时

问题描述:

使用 kubeadm join 命令时,出现下面的异常:

1
2
3
4
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher

问题分析及解决:

命令加入--v=6参数重试,出现下面的异常:

1
2
3
4
5
6
7
8
9
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
I1021 13:29:27.684395    7236 loader.go:375] Config loaded from file:  /etc/kubernetes/kubelet.conf
I1021 13:29:27.702881    7236 cert_rotation.go:137] Starting client certificate rotation controller
I1021 13:29:27.703055    7236 loader.go:375] Config loaded from file:  /etc/kubernetes/kubelet.conf
I1021 13:29:27.706231    7236 kubelet.go:194] [kubelet-start] preserving the crisocket information for the node
I1021 13:29:27.706268    7236 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "vm104" as an annotation
I1021 13:29:28.213944    7236 round_trippers.go:443] GET https://192.168.99.103:6443/api/v1/nodes/vm104?timeout=10s 401 Unauthorized in 6 milliseconds
I1021 13:29:28.708158    7236 round_trippers.go:443] GET https://192.168.99.103:6443/api/v1/nodes/vm104?timeout=10s 401 Unauthorized in 1 milliseconds
I1021 13:29:29.208988    7236 round_trippers.go:443] GET https://192.168.99.103:6443/api/v1/nodes/vm104?timeout=10s 401 Unauthorized in 2 milliseconds

因为 master 初始过多次,该节点也加入过多次,401 Unauthorized应该是之前的信息未删除,执行下面的命令后,重新加入即可:

1
$ kubeadm reset
2 metrics-server 安装后不可用

问题描述:

metrics-server 版本为 0.4.1;

使用kubectl top nodes出现下面的错误信息:

1
2
$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

使用kubectl get pods -n kube-system查看pod状态:

1
2
NAME                              READY   STATUS             RESTARTS   AGE
metrics-server-866b7d5b74-wc86x   0/1     CrashLoopBackOff   7          9m25s

使用kubectl describe metrics-server-866b7d5b74-wc86x -n kube-system查看 metrics-server pod 详细的描述(选取主要的异常信息):

1
2
3
4
5
6
7
8
9
10
11
12
13
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  <unknown>             default-scheduler  Successfully assigned kube-system/metrics-server-866b7d5b74-wc86x to vm105
  Normal   Pulling    11m                   kubelet, vm105     Pulling image "k8s.gcr.io/metrics-server/metrics-server:v0.4.1"
  Normal   Pulled     11m                   kubelet, vm105     Successfully pulled image "k8s.gcr.io/metrics-server/metrics-server:v0.4.1"
  Normal   Created    10m (x3 over 11m)     kubelet, vm105     Created container metrics-server
  Normal   Started    10m (x3 over 11m)     kubelet, vm105     Started container metrics-server
  Warning  Unhealthy  10m (x6 over 10m)     kubelet, vm105     Liveness probe failed: HTTP probe failed with statuscode: 500
  Normal   Killing    10m (x2 over 10m)     kubelet, vm105     Container metrics-server failed liveness probe, will be restarted
  Normal   Pulled     10m (x2 over 10m)     kubelet, vm105     Container image "k8s.gcr.io/metrics-server/metrics-server:v0.4.1" already present on machine
  Warning  Unhealthy  9m59s (x7 over 10m)   kubelet, vm105     Readiness probe failed: HTTP probe failed with statuscode: 500
  Warning  BackOff    69s (x33 over 8m38s)  kubelet, vm105     Back-off restarting failed container

问题分析及解决:

1
2
# 删除deployment
$ kubectl delete deployment metrics-server -n kube-system

修改配置:

spec.template.spec.containers.args下,增加- --kubelet-insecure-tls参数;

3 metrics-server 正常,但 kubectl top 无法查看资源信息

问题描述:

1
2
3
4
5
6
$ kubectl top node
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

$ get pods -n kube-system
NAME                             READY   STATUS    RESTARTS   AGE
metrics-server-84fc898bf-6g6tl   1/1     Running   0          24m

问题分析及解决:

kube-apiserver.yaml的 spec.containers.command 下新增--enable-aggregator-routing=true参数,然后重启 kubelet:

1
2
3
4
5
# 新增参数 --enable-aggregator-routing=true
$ vi /etc/kubernetes/manifests/kube-apiserver.yaml

# 重启
$ systemctl restart kubelet
4 metrics-server 正常,仍然无法查看资源信息:
1
2
3
4
5
6
7
8
9
$ kubectl describe apiservice v1beta1.metrics.k8s.io
Status:
  Conditions:
    Last Transition Time:  2021-02-22T09:16:47Z
    Message:               failing or missing response from https://10.244.1.58:4443/apis/metrics.k8s.io/v1beta1: Get https://10.244.1.58:4443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    Reason:                FailedDiscoveryCheck
    Status:                False
    Type:                  Available
Events:                    <none>

解决方案:

spec.template.spec下新增hostNetwork: true,然后重启 metrics-server 服务即可。

5 failed to get cgroup stats for “/system.slice/kubelet.service”

问题描述:

1
2
$ systemctl status kubelet.service
Dec 01 14:19:13 localhost.localdomain kubelet[17633]: E1201 14:19:13.827700   17633 summary_sys_containers.go:47] Failed to get system container stats for "/system.slice/kubelet.service": failed to get cgroup stats for "/system.slice/kubelet.service": failed to get container info for "/system.slice/kubelet.service": unknown container "/system.slice/kubelet.service"

问题分析及解决:

kubernetes和docker版本兼容性问题;

1
2
3
4
# 编辑添加参数 --runtime-cgroups=/systemd/system.slice --kubelet-cgroups=/systemd/system.slice
$ vi /var/lib/kubelet/kubeadm-flags.env

$ systemctl restart kubelet
5 从节点中无法使用 kubectl 命令

问题描述:

1
2
$ kubectl get modes
The connection to the server localhost:8080 was refused - did you specify the right host or port?

问题分析及解决:

kubectl命令需要使用kubernetes-admin来运行;

1
2
3
4
5
# 1. 主节点:将 admin.conf 拷贝到从节点
$ scp /etc/kubernetes/admin.conf root@192.168.99.104:/etc/kubernetes/admin.conf
# 2. 从节点:
$ echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
$ source ~/.bash_profile

2.2.2 yum 安装组件

2.2.3 源码

2.3 Pod 创建

2.3.1 创建Pod

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: v1
kind: Pod
metadata:
  name: static-web
  labels:
    # 可自定义标签
    custom-role: myrole
spec:
  containers:
    - name: web
      image: nginx
      ports:
        - name: web
          containerPort: 80
          protocol: TCP
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 查看支持创建的 apiVersion
$ kubectl api-versions

# 两种方式创建 pod;
# 区别:apply 方式可以更新配置;
$ kubectl create -f nginx-pod.yaml
$ kubectl apply -f nginx-pod.yaml

# 删除 pod
$ kubectl delete -f nginx-pod.yaml

# 进入Pod中默认容器(当有多个容器时, 进入第一个)
$ kubectl exec -it static-web -- sh
# 进入Pod中web容器
$ kubectl exec -it static-web -c web -- sh

获取 yaml 的方式:

1
$ kubectl run nginx --image=nginx --dry-run=client -o yaml

2.3.2 Pod 里运行命令及生命周期

1
2
3
4
5
6
7
8
9
10
11
apiVersion: v1
kind: Pod
metadata:
  name: myapp
  labels:
    custom-lable: busybox
spec:
  containers:
    - name: app
      image: busybox
      command: ['sh', '-c', 'echo OK && sleep 60']

command 其他写法:

1
2
3
4
5
6
7
8
spec:
  containers:
    - name: app
      image: busybox
      command:
      - sh
      - -c
      - echo OK && sleep 60
1
2
3
4
5
6
7
8
spec:
  containers:
    - name: app
      image: busybox
      args:
      - sh
      - -c
      - echo OK && sleep 60

2.3.3 镜像的下载策略

  • Always:每次都下载新的镜像;
  • Nerver:只使用本地镜像,从不下载;
  • IfNotPresent:本地没有才下载;
1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: nginx
  name: nginx
spec:
  containers:
  - image: nginx
    imagePullPolicy: Always
    name: nginx

2.3.4 环境变量

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: demo
  name: demo-busybox
spec:
  containers:
    - name: app
      image: busybox
      env:
      - name: ENV_1
        value: "hello!"
      command: ['/bin/echo']
      args: ["$(ENV_1)"]

2.3.5 重启策略

  • Always:总是重启;
  • OnFailure:失败了重启;
  • Nerver:从不重启;
1
2
3
4
5
6
7
8
9
10
11
12
apiVersion: v1
kind: Pod
metadata:
  creationTimestamp: null
  labels:
    run: nginx
  name: nginx
spec:
  containers:
  - image: nginx
    name: nginx
  restartPolicy: Always

2.3.6 容器状态

  • Pending:Pod 已经建立,但是 Pod 里的容器还没有创建完成;
  • Running:Pod 已经被调度到节点上,且容器工作正常;
  • Completed:Pod 里所有容器正常退出;
  • Failed:

若一个Pod中,有一个容器 Failed,Pod 状态就是 Error;有一个 Completed,另一个 Running,则 Pod 状态为 Running。

2.3.7 初始化容器

  • 可以设置多个初始化容器,按顺序启动,启动完成后,才会启动主容器;

  • 如果初始化容器失败,则会一直重启,Pod 不会创建;
  • init 容器支持应用容器的全部字段和特性,但不支持 ReadinessProbe,因为他们必须在Pod就绪前运行完成;
  • 在 Pod 上使用 activeDeadlineSeconds,在容器上使用 livenessProbe,这样能避免 init 容器一直失败。这就为 init 容器活跃设置了一个期限;
  • 在 Pod 中,每个容器的名称必须唯一;
  • 对 init 容器 spec 的修改,仅限于 image 字段;更改 init 容器的 image 字段,不会重启改 Pod
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
apiVersion: v1
kind: Pod
metadata:
  labels:
    run: nginx
  name: nginx
spec:
  volumes:
  - name: workdir
    emptyDir: {}
  containers:
  - image: nginx
    name: nginx
    volumeMounts:
    - name: workdir
      mountPath: '/app/config'
  initContainers:
  - image: busybox
    name: busybox
    command: ['sh', '-c', 'touch /opt/config.yml']
    volumeMounts:
    - name: workdir
      mountPath: '/opt'

2.3.8 静态Pod

所谓静态 Pod,就是不是 master 上创建的,需要到 node 的 /etc/kubelet.d/里创建一个 yaml 文件,根据该文件创建 Pod,且该 Pod 不接收 master 的管理。

创建过程:

  1. 在 node 执行上 systemctl status kubectl -l,找到启动配置目录/usr/lib/systemd/system/kubelet.service.d

  2. 编辑文件,新增--pod-manifest-path启动参数:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    
    # Note: This dropin only works with kubeadm and kubelet v1.11+
    [Service]
    Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --pod-manifest-path=/etc/kubernetes/static-yaml"
    Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
    # This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
    EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
    # This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
    # the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
    EnvironmentFile=-/etc/sysconfig/kubelet
    ExecStart=
    ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
    

    小提示:

    这里的/etc/kubernetes/static-yaml是自定义的路径,可修改;

  3. 在自定义目录``/etc/kubernetes/static-yaml`下,创建 yaml;

  4. 重启 kubelet;

    1
    2
    
    $ systemctl daemon-reload
    $ systemctl restart kubelet
    

删除过程:

  1. 删除自定义目录``/etc/kubernetes/static-yaml`下创建的 yaml 即可;
  2. (非必须)重启kubelet;

2.4 Pod 调度

2.4.1 调度的三个对象

待调度Pod

可用节点

调度算法

  • 主机过滤
  • 主机打分
    • LeastRequestPriority
    • BalanceResourceAllocation
    • CalculateSpreadPriority

调度策略

2.4.2 手动指定Pod运行位置

  1. 为节点指定标签

    1
    2
    3
    4
    5
    6
    7
    8
    
    # 查看节点标签
    $ kubectl get nodes --show-labels
    # 选择器
    $ kubectl get nodes --selector=kubernetes.io/hostname=vm104
    # 节点新增标签
    $ kubectl label node vm104 disktype=ssd
    # 删除节点标签
    $ kubectl label node vm104 disktype-
    
  2. 指定Pod运行在指定节点

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    
    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        run: nginx
      name: nginx
    spec:
      nodeSelector:
        disktype: ssd
      containers:
      - image: nginx
        name: nginx
    

2.4.3 节点维护

  1. 警戒线 Cordon

    1
    2
    
    $ kubectl cordon vm104
    $ kubectl uncordon vm104
    

    当设置了警戒线后,已经运行在这个节点上的Pod不会移动,需要删除后让其重新调度;

    当指定 Pod 到 Cordon 的节点时,Pod 会一直处于 Pending 状态。

  2. Drain

    Drain 包含 Cordon 和已有 Pod 驱逐。

    1
    
    $ kubectl drain vm104 --ignore-daemonsets
    
  3. 节点 taint 及 Pod 的 tolerations

    1
    2
    3
    4
    
    # 设置 taint
    $ kubectl taint nodes vm104 dedicated=special-user:NoSchedule
    # 删除
    $ $ kubectl taint nodes vm104 dedicated-
    

    如果要运行在 taint 的节点上:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    
    apiVersion: v1
    kind: Pod
    metadata:
      labels:
        run: nginx
      name: nginx
    spec:
      tolerations:
        key: 'dedicated'
        value: 'special-user'
        effect: 'NoSchedule'
        operator: 'Equal'
      containers:
      - image: nginx
        name: nginx
    

2.5 存储管理

2.5.1 存储类型

  1. 本地存储

    • emptyDir

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      
      apiVersion: v1
      kind: Pod
      metadata:
        labels:
          run: busybox
        name: busybox
      spec:
        volumes:
        - name: workdir
          emptyDir: {}
        containers:
        - image: busybox
          name: busybox
          command: ['sh', '-c', 'sleep 5000']
          volumeMounts:
          - name: workdir
            mountPath: '/app'
        - image: busybox
          name: busybox
          command: ['sh', '-c', 'sleep 5000']
          volumeMounts:
          - name: workdir
            mountPath: '/opt'
      
    • hostPath

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      
      apiVersion: v1
      kind: Pod
      metadata:
        name: test-pd
      spec:
        containers:
        - image: busybox
          name: test-container
          volumeMounts:
          - mountPath: /test-pd
            name: test-volume
        volumes:
        - name: test-volume
          hostPath:
            # directory location on host
            path: /data
            # this field is optional
            type: Directory
      
  2. 网络存储

    • NFS

      NAS 存储(客户端多时,会产生瓶颈)

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      
      apiVersion: v1
      kind: Pod
      metadata:
        name: test-pd
      spec:
        containers:
        - image: busybox
          name: test-container
          volumeMounts:
          - mountPath: '/test-pd'
            name: nfs
        volumes:
        - name: nfs
          nfs:
            path: '/data'
            server: 1.2.3.4
      
    • iscsi

      属于 ip-SAN 存储

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      
      apiVersion: v1
      kind: Pod
      metadata:
        name: iscsipd
      spec:
        containers:
        - name: iscsipd-rw
          image: kubernetes/pause
          volumeMounts:
          - mountPath: "/mnt/iscsipd"
            name: iscsipd-rw
        volumes:
        - name: iscsipd-rw
          iscsi:
            targetPortal: 10.0.2.15:3260
            portals: ['10.0.2.16:3260', '10.0.2.17:3260']
            iqn: iqn.2001-04.com.example:storage.kube.sys1.xyz
            lun: 0
            fsType: ext4
            readOnly: true
      
    • ceph

    • gluster

  3. 持久性存储

    docker-compose 创建nfs:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    
    version: '3'
    services:
      nfs_01:
        image: itsthenetwork/nfs-server-alpine
        ports:
          - 2049:2049
        cap_add:
          - SYS_ADMIN
        environment:
          PERMITTED: '*'
          SHARED_DIRECTORY: '/opt/share/pv01'
        volumes:
          - /opt/docker-nfs:/opt
    

pv 创建后,是全局可见的。

属于静态供应,创建PV后,供用户消费:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
   apiVersion: v1
   kind: PersistentVolume
   metadata:
     name: pv01
   spec:
     capacity:
       storage: 10Mi
     volumeMode: Filesystem
     accessModes:
       - ReadWriteOnce
     persistentVolumeReclaimPolicy: Recycle
     storageClassName: slow
     nfs:
       path: /opt/share/pv01
       server: 192.168.99.106

pvc 创建(基于命名空间):

1
2
3
4
5
6
7
8
9
10
11
12
   apiVersion: v1
   kind: PersistentVolumeClaim
   metadata:
     name: pvc01
   spec:
     accessModes:
       - ReadWriteOnce
     volumeMode: Filesystem
     resources:
       requests:
         storage: 10Mi
     storageClassName: slow

申请PV卷:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
   apiVersion: v1
   kind: Pod
   metadata:
     name: mypod
   spec:
     containers:
       - name: myfrontend
         image: nginx
         volumeMounts:
         - mountPath: "/var/www/html"
           name: mypd
     volumes:
       - name: mypd
         persistentVolumeClaim:
           claimName: pvc01

2.6 密码管理

2.6.1 secrect 保管密码

1 创建方式

1
2
3
4
5
6
7
8
9
10
11
12
# 方式一
$ kubectl create secret generic my-secret --from-literal=key1=supersecret --from-literal=key2=topsecret

$ kubectl get secret my-secret
$ echo '密文' | base64 -d

# 方式二
$ echo -n jiangjiang > passphrase
$ kubectl create secret generic my-secret --from-file=ssh-privatekey=path/to/id_rsa --from-literal=passphrase=./passphrase

# 方式三
$ kubectl create secret generic my-secret --from-env-file=path/to/bar.env

yaml 方式创建:

1
2
3
4
5
6
7
8
apiVersion: v1
kind: Secret
metadata:
  name: mysecret
type: Opaque
data:
  USER_NAME: YWRtaW4=
  PASSWORD: MWYyZDFlMmU2N2Rm

2 引用方式

  • 存储卷引用

    以卷的方式引用,支持动态更新密码,且密码以明文文件的方式读取。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    
    apiVersion: v1
    kind: Pod
    metadata:
      name: secret-test-pod
      labels:
        name: secret-test
    spec:
      volumes:
      - name: secret-volume
        secret:
          secretName: ssh-key-secret
          #items:
          #- key: ssh-publickey
          #  path: .
      containers:
      - name: ssh-test-container
        image: mySshImage
        volumeMounts:
        - name: secret-volume
          readOnly: true
          mountPath: "/etc/secret-volume"
    

    容器中的命令运行时,密钥的片段可以在以下目录找到:

    1
    2
    
    /etc/secret-volume/ssh-publickey
    /etc/secret-volume/ssh-privatekey
    
  • 变量引用

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    
    apiVersion: v1
    kind: Pod
    metadata:
      name: secret-env-pod
    spec:
      containers:
      - name: mycontainer
        image: redis
        env:
          - name: SECRET_USERNAME
            valueFrom:
              secretKeyRef:
                name: mysecret
                key: username
          - name: SECRET_PASSWORD
            valueFrom:
              secretKeyRef:
                name: mysecret
                key: password
    

2.6.2 configmap 保存密码

1 创建方式

创建后的配置是明文

1
2
# 三种创建方式类似 secret
$  kubectl create configmap my-config --from-literal=key1=config1 --from-literal=key2=config2

yaml 创建方式:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: v1
kind: ConfigMap
metadata:
  name: game-demo
data:
  # 类属性键;每一个键都映射到一个简单的值
  player_initial_lives: "3"
  ui_properties_file_name: "user-interface.properties"

  # 类文件键
  game.properties: |
    enemy.types=aliens,monsters
    player.maximum-lives=5    
  user-interface.properties: |
    color.good=purple
    color.bad=yellow
    allow.textmode=true

2 引用方式

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
apiVersion: v1
kind: Pod
metadata:
  name: configmap-demo-pod
spec:
  containers:
    - name: demo
      image: alpine
      command: ["sleep", "3600"]
      env:
        # 定义环境变量
        - name: PLAYER_INITIAL_LIVES # 请注意这里和 ConfigMap 中的键名是不一样的
          valueFrom:
            configMapKeyRef:
              name: game-demo           # 这个值来自 ConfigMap
              key: player_initial_lives # 需要取值的键
        - name: UI_PROPERTIES_FILE_NAME
          valueFrom:
            configMapKeyRef:
              name: game-demo
              key: ui_properties_file_name
      volumeMounts:
      - name: config
        mountPath: "/config"
        readOnly: true
  volumes:
    # 你可以在 Pod 级别设置卷,然后将其挂载到 Pod 内的容器中
    - name: config
      configMap:
        # 提供你想要挂载的 ConfigMap 的名字
        name: game-demo
        # 来自 ConfigMap 的一组键,将被创建为文件
        items:
        - key: "game.properties"
          path: "game.properties"
        - key: "user-interface.properties"
          path: "user-interface.properties"

2.7 Deployment

2.7.1 作用

ReplicationController

ReplicaSets

2.7.2 创建 Deployment

1 命令行创建

1
$ kubectl create deployment my-nginx --image=nginx

2 yaml 文件创建

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: my-nginx
  name: my-nginx
spec:
  replicas: 2
  selector:
    matchLabels:
      app: my-nginx
  template:
    metadata:
      labels:
        app: my-nginx
    spec:
      containers:
      - image: nginx
        name: nginx-name

2.7.3 修改副本数

1
2
3
4
5
6
7
# 方式一
$ kubectl scale deployment nginx --replicas=20

# 方式二
$ kubectl edit deployment nginx

# 方式三:更新 deployment yaml 并 apply

2.7.4 滚动更新

1
2
3
4
5
6
7
8
9
10
11
12
# 修改 my-nginx deployment 中,容器名为 nginx-container-name 的 nginx 镜像为 1.9
$ kubectl set image deployment/my-nginx nginx-container-name=nginx:1.9

# 查看 deployment 镜像版本
$ kubectl get deployment my-nginx -o wide

# 回滚
$ kubectl rollout undo deployment/my-nginx --record
# 查看历史版本
$ kubectl rollout history deployment/my-nginx
# 切换到指定版本
$ kubectl rollout undo deployment/my-nginx --to-revision=2

指定最大不可用:默认 25%,可以指定为个数;

指定最大浪涌:默认 25%,可以指定为个数;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  strategy:
    rollingUpdate: 
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx
        name: nginx

2.7.5 水平自动伸缩(HPA, Horizontal Pod Autoscaler)

通过检测pod CPU的负载,解决deployment里某pod负 载太重,动态伸缩pod的数量来负载均衡。

1
2
3
4
5
6
7
8
$ kubectl autoscale deployment my-nginx --min=2 --max=10

$ kubectl autoscale deployment my-nginx --max=5 --cpu-percent=80

# 查看 HPA
$ kubectl get hpa
# 删除
$ kubectl delete hpa my-nginx

解决当前cpu的使用量为unknown:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# 修改 resources
$ kubectl edit deployment my-nginx

containers:
  - image: nginx:1.7.9
  imagePullPolicy: Always 
  name: nginx 
  resources: 
    requests: 
      cpu: 100m

# 修改 kube-controller-manager.yaml
$ vi /etc/kubernetes/manifests/kube-controller-manager.yaml

- command:
  - kube-controller-manager
  - --horizontal-pod-autoscaler-use-rest-clients=true
  - --horizontal-pod-autoscaler-sync-period=10s

HPA 测试:

1
2
3
4
# 进入容器
$ kubectl exec -it my-nginx-665bc6f67f-5jd8m -- sh

> cat /dev/zero /dev/null &

2.8 其他的控制器

2.9 健康检查

一种探测(probe)机制,探测到不正常则重启;

liveness:存活检测,如果有问题,直接重启;

readiness:就绪检测,如果有问题,不加入服务;

startup:启动检测,检测失败,直接重启(1.16的新功能,1.18 beta)

三种检测方式:command/httpGET/TCP

注:对每一种 Probe,只会执行一种检测;如果定义了多个 Handler,按 Exec、HTTPGet、TCPSocket 的优先级选择。

1 Exec Handler

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: v1 
kind: Pod 
metadata: 
  labels: 
    test: liveness 
  name: liveness-exec
spec: 
  containers:
  - name: liveness
    image: busybox
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 30
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5 #容器启动的5s内不监测
      periodSeconds: 5 #每5s钟检测一次

2 HTTPGet Handler

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  containers:
  - name: liveness
    image: nginx
    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /index.html
        port: 80
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 10

3 TCPSocket Handler

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  containers:
  - name: liveness
    image: nginx
    livenessProbe:
    failureThreshold: 3
    tcpSocket:
      port: 80
    initialDelaySeconds: 10
    periodSeconds: 10
    successThreshold: 1
    timeoutSeconds: 10

initialDelaySeconds:容器启动后第一次执行探测是需要等待多少秒。

periodSeconds:执行探测的频率,默认是10秒,最小1秒。

timeoutSeconds:探测超时时间,默认1秒,最小1秒。

successThreshold:探测失败后,最少连续探测成功多少次才被认定为成功,默认是1,对于liveness必须

是1,最小值是1。

failureThreshold:探测成功后,最少连续探测失败多少次才被认定为失败。默认是3。最小值是1。

2.10 JOB

1 JOB

适用于一次计算任务;

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: batch/v1
kind: Job
metadata:
  name: job1
spec:
  backoffLimit: 6
  completions: 1
  parallelism: 1
  template:
    metadata:
      name: pi
    spec:
      containers:
      - name: hello
        image: busybox
        command: ["echo","hello world!"]
      restartPolicy: Never

job的restart策略只能是:

  • Nerver:只要任务没有完成,则是新创建pod运行,直到job完成,会产生多个pod;

  • OnFailure:只要pod没有完成,则会重启pod,直到job完成;

parallelism: 1 一次性运行pod的个数

completions: 1 有一个pod运行成功

backoffLimit: 6 失败pod的极限

2 CronJob

启用cronjob需要添加- --runtime-config=batch/v2alpha1=true

1
2
3
4
5
6
7
# 添加 - --runtime-config=batch/v2alpha1=tru
$ vim /etc/kubernetes/manifests/kube-apiserver.yaml

# 重启服务
$ systemctl restart kubelet.service
# 验证
$ kubectl api-versions
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
apiVersion: batch/v2alpha1
kind: CronJob
metadata:
  name: job2
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            command: ["echo","hello world!"]
          restartPolicy: OnFailure

2.11 Service

1 创建 Service

1
2
3
4
5
6
7
8
9
10
11
12
13
apiVersion: v1
kind: Service
metadata:
  labels:
    name: test
  name: svc1
spec:
  ports:
  - port: 80
    targetPort: 80
  selector:
    app: my-nginx
  type: NodePort

2 服务发现

通过变量的方式发现

1
2
3
4
5
6
7
8
9
env:
  - name: WORDPRESS_DB_USER
    value: root
  - name: WORDPRESS_DB_PASSWORD
    value: redhat
  - name: WORDPRESS_DB_NAME
    value: blog
  - name: WORDPRESS_DB_HOST
    value: $(MYSQL_SERVICE_HOST)
  1. 只能获取相同 namespace 里的变量;
  2. 变量的获取有先后顺序,引用的变量必须先创建。

通过 DNS 的方式发现

在 kube-system 里有 DNS,可以自动发现所有命名空间里的服务的 clusterIP,所以在同一个命名空间里,一个服务访问另外一个服务的时候,可以直接通过服务名来访问。

只要创建了一个服务,都会自动向 kube-system 里的 DNS 注册。

如果是不同的命名空间,可以通过服务名.命名空间名来访问。

3 服务的发布

所谓发布指的是,如何让集群之外的主机能访问服务。

  • NodePort
  • LoadBalancer
  • ExternalName
  • ClusterIP
  • ingress
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: myingress
spec:
  rules:
  - host: www.rhce.cc
  http:
    paths:
    - path: /
      backend:
        serviceName: nginx2
        servicePort: 80
    - path: /rhce
      backend:
        serviceName: nginx2
        servicePort: 80

2.12 网络模型