Introduction to Kubernetes (10): Deployment of Control Node, Selectors, Affinity, and Taints

2021年4月23日 458点热度 0人点赞 0条评论

内容目录

Labels and nodeSelector
Label Selection
Affinity and Anti-Affinity
Taints and Tolerations
Default System Taints
Tolerations
DaemonSet

In our previous studies, we learned about Deployment and ReplicaSets, but the Pods are randomly scheduled to Worker nodes. Even with 3 Workers and 3 replicas set, it is not guaranteed that each Node will run one Pod; it is possible for one Node to run all three replicas.

In this article, we will explore the concepts of DaemonSet, tolerations, affinity, Labels, selectors in Kubernetes, in order to control the deployment of Pods.

Labels and nodeSelector

Labels are key-value pairs attached to Kubernetes objects. When represented in JSON, the labels attached to metadata may look like this:

"metadata": {  
  "labels": {  
    "key1": "value1",  
    "key2": "value2"  
  }  
}

In YAML:

metadata:  
  labels:  
    key1: "value1"  
    key2: "value2"

Labels are primarily used to represent meaningful attributes of objects to users.

Nodes can also be assigned some Labels. For example, in the kube-system namespace, where Kubernetes core components are running, we can check the Labels of all components in this namespace.

kubectl get nodes --namespace=kube-system --show-labels

beta.kubernetes.io/arch=amd64,  
beta.kubernetes.io/os=linux,  
kubernetes.io/arch=amd64,  
...

We can also manually add a label to a Node.

kubectl label nodes <node-name> <label-key>=<label-value>

For instance, we can set a node's disksize label to indicate whether the node has sufficient disk space.

kubectl label nginx disksize=big

Then, when writing a YAML file, if we want this Pod to run on a Node with sufficient capacity, we can write:

  nodeSelector:  
    disksize=big

Now, as an official example, let's set a Node's Label to indicate that the disk is SSD.

kubectl label nodes kubernetes-foo-node-1.c.a-robinson.internal disktype=ssd

In the YAML file's node selector, we add the selection.

spec:  
  containers:  
  - name: nginx  
    image: nginx  
    imagePullPolicy: IfNotPresent  
  nodeSelector:  
    disktype: ssd

Labels can be used in multiple places, such as adding Labels to a Node to identify it, using NodeSelector to select suitable Nodes to run Pods, and using them in metadata to describe metadata.

Labels added to the metadata can be filtered during command queries.

Querying the labels of Pods:

kubectl get pods --show-labels

Finding Pods that meet the criteria (reference the LABELS field, as this can be selected based on the labels within):

kubectl get pods -l app=nginx

Label Selection

Previously, we learned about nodeSelector, which helps us select suitable Nodes for running Pods. In fact, Kubernetes' label selection is rich and diverse, such as:

  nodeSelector:  
    disktype: ssd  
    disksize: big

This indicates that the node selector is an equality selection, and the expression is disktype=ssd && disksize=big.

Label selection can be done through equality or set operations. Equality selection includes =, ==, and !=, where = and == can be used interchangeably. When there are multiple requirements (multiple labels), instead of using the && operator, it should be noted that selector does not support the || logical OR operator.

YAML only supports the {key}:{value} format, while in command format, we can use the three operators above.

kubectl get nodes -l disktype=ssd,disksize!=big  
# Multiple conditions are separated by a comma ",", not "&&".

For the set selection method, three operators are supported: in, notin, and exists. However, don't misunderstand this as selecting from a set.
For example, suppose there are three Nodes with disksizes of big, medium, and small, and we want to deploy a Pod that can run on either big or medium, we can write:

... -l disksize in (big,medium)

... -l disksize notin (small)  
# Not running on small

The exists operator is similar to !=, but only indicates that as long as this label exists, it doesn't matter what its value is.

-l disksize  
# Equivalent to -l disksize in (big,medium,small)

We can also use '' to wrap the selection expression.

kubectl get pods -l 'app=nginx'

As previously mentioned, there's the YAML nodeSelector and command-line selection. Here, we introduce the YAML selector.

We mentioned earlier that we can add Labels in the Deployment's metadata, i.e., attach Labels to Pods. We can also use label selection to filter Pods when creating Services or using ReplicationControllers.

If we have already deployed nginx, then querying kubectl get pods --show-labels would display LABELS such as app=nginx, allowing us to make selections like this:

  selector:  
    app: nginx

Full version:

apiVersion: v1  
kind: Service  
metadata:  
  name: my-service  
spec:  
  type: LoadBalancer  
  selector:  
    app: nginx  
  ports:  
    - protocol: TCP  
      port: 80  
      targetPort: 6666  
status:  
  loadBalancer:  
    ingress:  
      - ip: 192.0.2.127

Selector also supports selection methods such as matchLabels and matchExpressions:

matchLabels is a mapping made up of {key,value} pairs. A single {key,value} in the matchLabels mapping is equivalent to an element of matchExpressions, where its key field is "key", operator is "In", and the values array contains only "value."

matchExpressions is a list of requirements for Pod selection operators. Valid operators include In, NotIn, Exists, and DoesNotExist. In cases of In and NotIn, the set values must be non-empty. All requirements from matchLabels and matchExpressions are logically combined by an AND relationship — they must all match in order to qualify.

Here is an example:

selector:  
  matchLabels:  
    component: redis  
  matchExpressions:  
    - {key: tier, operator: In, values: [cache]}  
    - {key: environment, operator: NotIn, values: [dev]}

We will not elaborate further on these selection rules. The aforementioned information should be sufficient, and readers can refer to the official documentation for more complex operations: https://kubernetes.io/zh/docs/concepts/overview/working-with-objects/labels/

Affinity and Anti-Affinity

Earlier we learned about nodeSelector, using nodeSelector to select suitable Labels, which can express the constraints we desire.

Affinity is similar to nodeSelector, allowing Pods to be scheduled on specific nodes based on labels.

Pod affinity has two types:

requiredDuringSchedulingIgnoredDuringExecution

Hard requirement rules that must be satisfied for scheduling the Pod to a node.
preferredDuringSchedulingIgnoredDuringExecution

Indicates a preference to schedule but is not guaranteed.

Here is an official example:

apiVersion: v1  
kind: Pod  
metadata:  
  name: with-node-affinity  
spec:  
  affinity:  
    nodeAffinity:  
      requiredDuringSchedulingIgnoredDuringExecution:  
        nodeSelectorTerms:  
        - matchExpressions:  
          - key: kubernetes.io/e2e-az-name  
            operator: In  
            values:  
            - e2e-az1  
            - e2e-az2  
      preferredDuringSchedulingIgnoredDuringExecution:  
      - weight: 1  
        preference:  
          matchExpressions:  
          - key: another-node-label-key  
            operator: In  
            values:  
            - another-node-label-value  
  containers:  
  - name: with-node-affinity  
    image: k8s.gcr.io/pause:2.0

The constraints of affinity relative to:

... ... -l kubernetes.io/e2e-az-name in (e2e-az1,e2e-az2)

Affinity sets close relationships, nodeAffinity sets node affinity, and finally affinity indicates the constraints that must and should be met.

If multiple nodeSelectorTerms are set:

requiredDuringSchedulingIgnoredDuringExecution:  
  nodeSelectorTerms:  
  ...  
  nodeSelectorTerms:

Then only one of them needs to be satisfied in order to schedule the Pod onto the node.

If both nodeSelector and nodeAffinity are specified, both must be satisfied for the Pod to be scheduled to candidate nodes.

The syntax for node affinity supports the following operators: In, NotIn, Exists, DoesNotExist, Gt, Lt.

Pod affinity and anti-affinity legal operators include In, NotIn, Exists, DoesNotExist.

Using -Affinity can set affinity, such as node affinity nodeAffinity, and setting anti-affinity uses -AntiAffinity, for example nodeAntiAffinity.

Anti-affinity is similar to affinity, with both having requiredDuringSchedulingIgnoredDuringExecution hard constraints and preferredDuringSchedulingIgnoredDuringExecution soft constraints, except anti-affinity indicates the opposite, meaning if the conditions are met, then Pods cannot be scheduled.

This concludes our explanation of affinity and anti-affinity. The configuration for both is quite complex, and readers can refer to the official documentation for further learning; we won't delve into it here.

Taints and Tolerations

Earlier, we mentioned affinity and anti-affinity, which helps select suitable nodes for Pods or services selecting suitable Pods; these label-bearing objects are all subject to selection.

Here, we introduce taints and tolerations, which can exclude the "chosen" fate.

Node taints can repel specific Pods, while Tolerations indicate the ability of Pods to tolerate these taints.

When a taint is added to a node, unless a Pod declares it can tolerate this taint, it will not be scheduled on that node.

The system will try to avoid scheduling Pods on nodes with taints that they cannot tolerate, but this is not enforced. Kubernetes processes multiple taints and tolerations like a filter: starting from all the taints of a node, it filters out those taints that match the tolerations present in the Pods.

However, if you only have one worker, and it has a taint, the Pod can only run on this node.

Adding a taint has the format:

kubectl taint node [node] key=value:[effect]

To update or overwrite a taint:

kubectl taint node [node] key=value:[effect] --overwrite=true

Using kubectl taint to add a taint to a node.

kubectl taint nodes node1 key1=value1:NoSchedule

Removing a taint:

kubectl taint nodes node1 key1=value1:NoSchedule-

Here, the taint needs to have a label set, and this label's effect is set to NoSchedule.

The effects of taints are called effects, and node taints can be set to one of the following three effects:

NoSchedule: Pods that cannot tolerate this taint will not be scheduled on the node; existing Pods are unaffected.
PreferNoSchedule: Kubernetes will avoid scheduling Pods that cannot tolerate this taint onto the node.
NoExecute: If the Pod is already running on the node, it will be evicted from the node; if it has not yet run, it will not be scheduled on the node.

However, certain system-created Pods can tolerate all NoExecute and NoSchedule taints, so they will not be evicted. For example, Pods cannot be deployed on the master node, yet many system Pods exist in the kube-system namespace. Of course, by modifying the taint, user Pods can be deployed onto the master node.

To query the taints of a node:

kubectl describe nodes | grep Taints

Taints:             node-role.kubernetes.io/master:NoSchedule  
Taints:             key1=value1:NoSchedule

Default System Taints

We can remove the master node's taint:

kubectl taint node instance-1 node-role.kubernetes.io/master:NoSchedule-

Then deploy an nginx Pod.

kubectl create deployment nginxtaint --image=nginx:latest --replicas=3

Check the Pods:

kubectl get pods -o wide

As a result, all three replicas have been found running on the master node.

To ensure cluster security, we need to restore the master taint.

kubectl taint node instance-1 node-role.kubernetes.io/master:NoSchedule

When certain conditions are true, the node controller will automatically add a taint to a node. The currently built-in taints include:

node.kubernetes.io/not-ready: Node is not ready. This is equivalent to the node's Ready status being "False".
node.kubernetes.io/unreachable: The node controller cannot access the node. This is equivalent to the node's Ready status being "Unknown".
node.kubernetes.io/out-of-disk: Node's disk is exhausted.
node.kubernetes.io/memory-pressure: Node is under memory pressure.
node.kubernetes.io/disk-pressure: Node is under disk pressure.
node.kubernetes.io/network-unavailable: Node's network is unavailable.
node.kubernetes.io/unschedulable: Node is unschedulable.
node.cloudprovider.kubernetes.io/uninitialized: If a "hosted" cloud provider driver is specified when starting kubelet, it will taint the node to mark it as unavailable. Once this node is initialized by a controller in cloud-controller-manager, kubelet will remove this taint.

Tolerations

A node can set taints to repel Pods, but a Pod can also set tolerations to tolerate the node's taints.

tolerations:  
- key: "key1"  
  operator: "Exists"  
  effect: "NoSchedule"

Value can also be set.

tolerations:  
- key: "key1"  
  operator: "Equal"  
  value: "value1"  
  effect: "NoSchedule"

The default value for operator is Equal.

A toleration and a taint "matching" means they have the same key and effect, and:

If operator is Exists

The toleration cannot specify a value, and if a label with key key1 exists, and its taint effect is NoSchedule, it is tolerated.
If operator is Equal, the value should be equal.

If effect is left blank, it means that as long as there is a label with key1, it can be tolerated.

If:

tolerations:  
  operator: "Exists"

This indicates that this Pod can tolerate any taint, regardless of how the node sets key, value, or effect; this Pod will not mind.

If we want to also deploy Pods on the master, we can modify the Pod's toleration:

    spec:  
      tolerations:  
      # this toleration is to have the daemonset runnable on master nodes  
      # remove it if your masters can't run pods  
      - key: node-role.kubernetes.io/master  
        effect: NoSchedule

DaemonSet

In Kubernetes, there are three -Sets: ReplicaSet, DaemonSet, and StatefulSets. The load types include Deployments, ReplicaSets, DaemonSets, StatefulSets, etc. (or in other words, these few controllers exist).

\n\n

Deployments have been introduced earlier, and kind: ReplicaSet is generally unnecessary, as you can add replicas: in kind: Deployment.

\n\n

kind: DaemonSet requires a YAML description, but overall it is similar to Deployment.

\n\n

A DaemonSet ensures that only one Pod replica runs on each node. For example, if there is an nginx pod, when a new Node joins the cluster, a pod will be automatically deployed on that Node; when a node removes itself from the cluster, the Pod on that Node will be reclaimed; if the DaemonSet configuration is deleted, all Pods created by it will also be removed.

\n\n

Some typical use cases for DaemonSet: