目录
- Labels and nodeSelector
- Label Selection
- Affinity and Anti-Affinity
- Taints and Tolerations
- Default System Taints
- Tolerations
- DaemonSet
In our previous studies, we learned about Deployment and ReplicaSets, but the Pods are randomly scheduled to Worker nodes. Even with 3 Workers and 3 replicas set, it is not guaranteed that each Node will run one Pod; it is possible for one Node to run all three replicas.
In this article, we will explore the concepts of DaemonSet, tolerations, affinity, Labels, selectors in Kubernetes, in order to control the deployment of Pods.
Labels and nodeSelector
Labels are key-value pairs attached to Kubernetes objects. When represented in JSON, the labels attached to metadata may look like this:
"metadata": {
"labels": {
"key1": "value1",
"key2": "value2"
}
}
In YAML:
metadata:
labels:
key1: "value1"
key2: "value2"
Labels are primarily used to represent meaningful attributes of objects to users.
Nodes can also be assigned some Labels. For example, in the kube-system namespace, where Kubernetes core components are running, we can check the Labels of all components in this namespace.
kubectl get nodes --namespace=kube-system --show-labels
beta.kubernetes.io/arch=amd64,
beta.kubernetes.io/os=linux,
kubernetes.io/arch=amd64,
...
We can also manually add a label to a Node.
kubectl label nodes <node-name> <label-key>=<label-value>
For instance, we can set a node's disksize
label to indicate whether the node has sufficient disk space.
kubectl label nginx disksize=big
Then, when writing a YAML file, if we want this Pod to run on a Node with sufficient capacity, we can write:
nodeSelector:
disksize=big
Now, as an official example, let's set a Node's Label to indicate that the disk is SSD.
kubectl label nodes kubernetes-foo-node-1.c.a-robinson.internal disktype=ssd
In the YAML file's node selector, we add the selection.
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
disktype: ssd
Labels can be used in multiple places, such as adding Labels to a Node to identify it, using NodeSelector to select suitable Nodes to run Pods, and using them in metadata
to describe metadata.
Labels added to the metadata can be filtered during command queries.
Querying the labels of Pods:
kubectl get pods --show-labels
Finding Pods that meet the criteria (reference the LABELS field, as this can be selected based on the labels within):
kubectl get pods -l app=nginx
Label Selection
Previously, we learned about nodeSelector, which helps us select suitable Nodes for running Pods. In fact, Kubernetes' label selection is rich and diverse, such as:
nodeSelector:
disktype: ssd
disksize: big
This indicates that the node selector is an equality selection, and the expression is disktype=ssd && disksize=big
.
Label selection can be done through equality or set operations. Equality selection includes =
, ==
, and !=
, where =
and ==
can be used interchangeably. When there are multiple requirements (multiple labels), instead of using the &&
operator, it should be noted that selector does not support the ||
logical OR operator.
YAML only supports the {key}:{value}
format, while in command format, we can use the three operators above.
kubectl get nodes -l disktype=ssd,disksize!=big
# Multiple conditions are separated by a comma ",", not "&&".
For the set selection method, three operators are supported: in
, notin
, and exists
. However, don't misunderstand this as selecting from a set.
For example, suppose there are three Nodes with disksizes of big, medium, and small, and we want to deploy a Pod that can run on either big or medium, we can write:
... -l disksize in (big,medium)
... -l disksize notin (small)
# Not running on small
The exists operator is similar to !=
, but only indicates that as long as this label exists, it doesn't matter what its value is.
-l disksize
# Equivalent to -l disksize in (big,medium,small)
We can also use ''
to wrap the selection expression.
kubectl get pods -l 'app=nginx'
As previously mentioned, there's the YAML nodeSelector
and command-line selection. Here, we introduce the YAML selector.
We mentioned earlier that we can add Labels in the Deployment's metadata, i.e., attach Labels to Pods. We can also use label selection to filter Pods when creating Services or using ReplicationControllers.
If we have already deployed nginx, then querying kubectl get pods --show-labels
would display LABELS such as app=nginx
, allowing us to make selections like this:
selector:
app: nginx
Full version:
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
type: LoadBalancer
selector:
app: nginx
ports:
- protocol: TCP
port: 80
targetPort: 6666
status:
loadBalancer:
ingress:
- ip: 192.0.2.127
Selector also supports selection methods such as matchLabels
and matchExpressions
:
matchLabels
is a mapping made up of {key,value}
pairs. A single {key,value}
in the matchLabels
mapping is equivalent to an element of matchExpressions
, where its key
field is "key", operator
is "In", and the values
array contains only "value."
matchExpressions
is a list of requirements for Pod selection operators. Valid operators include In
, NotIn
, Exists
, and DoesNotExist
. In cases of In
and NotIn
, the set values must be non-empty. All requirements from matchLabels
and matchExpressions
are logically combined by an AND relationship — they must all match in order to qualify.
Here is an example:
selector:
matchLabels:
component: redis
matchExpressions:
- {key: tier, operator: In, values: [cache]}
- {key: environment, operator: NotIn, values: [dev]}
We will not elaborate further on these selection rules. The aforementioned information should be sufficient, and readers can refer to the official documentation for more complex operations: https://kubernetes.io/zh/docs/concepts/overview/working-with-objects/labels/
Affinity and Anti-Affinity
Earlier we learned about nodeSelector
, using nodeSelector
to select suitable Labels, which can express the constraints we desire.
Affinity is similar to nodeSelector, allowing Pods to be scheduled on specific nodes based on labels.
Pod affinity has two types:
-
requiredDuringSchedulingIgnoredDuringExecution
Hard requirement rules that must be satisfied for scheduling the Pod to a node.
-
preferredDuringSchedulingIgnoredDuringExecution
Indicates a preference to schedule but is not guaranteed.
Here is an official example:
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
containers:
- name: with-node-affinity
image: k8s.gcr.io/pause:2.0
The constraints of affinity relative to:
... ... -l kubernetes.io/e2e-az-name in (e2e-az1,e2e-az2)
Affinity sets close relationships, nodeAffinity sets node affinity, and finally affinity indicates the constraints that must and should be met.
If multiple nodeSelectorTerms are set:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
...
nodeSelectorTerms:
Then only one of them needs to be satisfied in order to schedule the Pod onto the node.
If both nodeSelector
and nodeAffinity
are specified, both must be satisfied for the Pod to be scheduled to candidate nodes.
The syntax for node affinity supports the following operators: In
, NotIn
, Exists
, DoesNotExist
, Gt
, Lt
.
Pod affinity and anti-affinity legal operators include In
, NotIn
, Exists
, DoesNotExist
.
Using -Affinity
can set affinity, such as node affinity nodeAffinity
, and setting anti-affinity uses -AntiAffinity
, for example nodeAntiAffinity
.
Anti-affinity is similar to affinity, with both having requiredDuringSchedulingIgnoredDuringExecution
hard constraints and preferredDuringSchedulingIgnoredDuringExecution
soft constraints, except anti-affinity indicates the opposite, meaning if the conditions are met, then Pods cannot be scheduled.
This concludes our explanation of affinity and anti-affinity. The configuration for both is quite complex, and readers can refer to the official documentation for further learning; we won't delve into it here.
Taints and Tolerations
Earlier, we mentioned affinity and anti-affinity, which helps select suitable nodes for Pods or services selecting suitable Pods; these label-bearing objects are all subject to selection.
Here, we introduce taints and tolerations, which can exclude the "chosen" fate.
Node taints can repel specific Pods, while Tolerations indicate the ability of Pods to tolerate these taints.
When a taint is added to a node, unless a Pod declares it can tolerate this taint, it will not be scheduled on that node.
The system will try to avoid scheduling Pods on nodes with taints that they cannot tolerate, but this is not enforced. Kubernetes processes multiple taints and tolerations like a filter: starting from all the taints of a node, it filters out those taints that match the tolerations present in the Pods.
However, if you only have one worker, and it has a taint, the Pod can only run on this node.
Adding a taint has the format:
kubectl taint node [node] key=value:[effect]
To update or overwrite a taint:
kubectl taint node [node] key=value:[effect] --overwrite=true
Using kubectl taint
to add a taint to a node.
kubectl taint nodes node1 key1=value1:NoSchedule
Removing a taint:
kubectl taint nodes node1 key1=value1:NoSchedule-
Here, the taint needs to have a label set, and this label's effect is set to NoSchedule.
The effects of taints are called effects, and node taints can be set to one of the following three effects:
NoSchedule
: Pods that cannot tolerate this taint will not be scheduled on the node; existing Pods are unaffected.PreferNoSchedule
: Kubernetes will avoid scheduling Pods that cannot tolerate this taint onto the node.NoExecute
: If the Pod is already running on the node, it will be evicted from the node; if it has not yet run, it will not be scheduled on the node.
However, certain system-created Pods can tolerate all NoExecute
and NoSchedule
taints, so they will not be evicted. For example, Pods cannot be deployed on the master node, yet many system Pods exist in the kube-system
namespace. Of course, by modifying the taint, user Pods can be deployed onto the master node.
To query the taints of a node:
kubectl describe nodes | grep Taints
Taints: node-role.kubernetes.io/master:NoSchedule
Taints: key1=value1:NoSchedule
Default System Taints
We can remove the master node's taint:
kubectl taint node instance-1 node-role.kubernetes.io/master:NoSchedule-
Then deploy an nginx Pod.
kubectl create deployment nginxtaint --image=nginx:latest --replicas=3
Check the Pods:
kubectl get pods -o wide
As a result, all three replicas have been found running on the master node.
To ensure cluster security, we need to restore the master taint.
kubectl taint node instance-1 node-role.kubernetes.io/master:NoSchedule
When certain conditions are true, the node controller will automatically add a taint to a node. The currently built-in taints include:
node.kubernetes.io/not-ready
: Node is not ready. This is equivalent to the node'sReady
status being "False
".node.kubernetes.io/unreachable
: The node controller cannot access the node. This is equivalent to the node'sReady
status being "Unknown
".node.kubernetes.io/out-of-disk
: Node's disk is exhausted.node.kubernetes.io/memory-pressure
: Node is under memory pressure.node.kubernetes.io/disk-pressure
: Node is under disk pressure.node.kubernetes.io/network-unavailable
: Node's network is unavailable.node.kubernetes.io/unschedulable
: Node is unschedulable.node.cloudprovider.kubernetes.io/uninitialized
: If a "hosted" cloud provider driver is specified when starting kubelet, it will taint the node to mark it as unavailable. Once this node is initialized by a controller in cloud-controller-manager, kubelet will remove this taint.
Tolerations
A node can set taints to repel Pods, but a Pod can also set tolerations to tolerate the node's taints.
tolerations:
- key: "key1"
operator: "Exists"
effect: "NoSchedule"
Value can also be set.
tolerations:
- key: "key1"
operator: "Equal"
value: "value1"
effect: "NoSchedule"
The default value for operator
is Equal
.
A toleration and a taint "matching" means they have the same key and effect, and:
-
If
operator
isExists
The toleration cannot specify a
value
, and if a label with keykey1
exists, and its taint effect isNoSchedule
, it is tolerated. -
If
operator
isEqual
, thevalue
should be equal.
If effect
is left blank, it means that as long as there is a label with key1
, it can be tolerated.
If:
tolerations:
operator: "Exists"
This indicates that this Pod can tolerate any taint, regardless of how the node sets key
, value
, or effect
; this Pod will not mind.
If we want to also deploy Pods on the master, we can modify the Pod's toleration:
spec:
tolerations:
# this toleration is to have the daemonset runnable on master nodes
# remove it if your masters can't run pods
- key: node-role.kubernetes.io/master
effect: NoSchedule
DaemonSet
In Kubernetes, there are three -Set
s: ReplicaSet, DaemonSet, and StatefulSets. The load types include Deployments, ReplicaSets, DaemonSets, StatefulSets, etc. (or in other words, these few controllers exist).
\n\n
Deployments have been introduced earlier, and kind: ReplicaSet
is generally unnecessary, as you can add replicas:
in kind: Deployment
.
\n\n
kind: DaemonSet
requires a YAML description, but overall it is similar to Deployment.
\n\n
A DaemonSet ensures that only one Pod replica runs on each node. For example, if there is an nginx pod, when a new Node joins the cluster, a pod will be automatically deployed on that Node; when a node removes itself from the cluster, the Pod on that Node will be reclaimed; if the DaemonSet configuration is deleted, all Pods created by it will also be removed.
\n\n
Some typical use cases for DaemonSet:
- Run a cluster daemon on each node
- Run a log collection daemon on each node
- Run a monitoring daemon on each node
\n\n
In the YAML, to configure DaemonSet, you can use tolerations
, with a configuration example:
kind: DaemonSet
... ...
\n\n
Other configurations are consistent with Deployment.
文章评论