Pod Scheduling with the CockroachDB Operator

On this page Carat arrow pointing down

This page describes how to configure pod scheduling settings. These settings control how CockroachDB pods should be identified or scheduled onto worker nodes, which are then proxied to the Kubernetes scheduler.

Note:

The CockroachDB operator is in Preview.

Node selectors

A pod with a node selector will be scheduled onto a worker node that has matching labels, or key-value pairs.

Specify the labels in cockroachdb.crdbCluster.nodeSelector in the values file used to deploy the cluster. If you specify multiple nodeSelector labels, the node must match all of them.

The following configuration causes CockroachDB pods to be scheduled onto worker nodes that have both the labels worker-pool-name=crdb-workers and kubernetes.io/arch=amd64:

cockroachdb:
  crdbCluster:
    nodeSelector:
      worker-pool-name: crdb-workers
      kubernetes.io/arch: amd64

For an example of labeling nodes, see Scheduling CockroachDB onto labeled nodes.

Affinities and anti-affinities

A pod with a node affinity seeks out worker nodes that have matching labels. A pod with a pod affinity seeks out pods that have matching labels. A pod with a pod anti-affinity avoids pods that have matching labels.

Affinities and anti-affinities can be used together with operator fields to:

  • Require CockroachDB pods to be scheduled onto a labeled worker node.
  • Require CockroachDB pods to be co-located with labeled pods (e.g., on a node or region).
  • Prevent CockroachDB pods from being scheduled onto a labeled worker node.
  • Prevent CockroachDB pods from being co-located with labeled pods (e.g., on a node or region).

For an example, see Scheduling CockroachDB onto labeled nodes.

Add a node affinity

Specify node affinities in cockroachdb.crdbCluster.affinity.nodeAffinity in the values file used to deploy the cluster. If you specify multiple matchExpressions labels, the node must match all of them. If you specify multiple values for a label, the node can match any of the values.

The following configuration requires that CockroachDB pods are scheduled onto worker nodes running a Linux operating system, with a preference against worker nodes in the us-east4-b availability zone.

cockroachdb:
  crdbCluster:
    affinity:
      nodeAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
          nodeSelectorTerms:
          - matchExpressions:
            - key: kubernetes.io/os
              operator: In
              values: 
              - linux
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          preference:
            matchExpressions:
            - key: topology.kubernetes.io/zone
              operator: NotIn
              values:
              - us-east4-b

The requiredDuringSchedulingIgnoredDuringExecution node affinity rule, using the In operator, requires CockroachDB pods to be scheduled onto nodes with the matching label kubernetes.io/os=linux. It will not evict pods that are already running on nodes that do not match the affinity requirements.

The preferredDuringSchedulingIgnoredDuringExecution node affinity rule, using the NotIn operator and specified weight, discourages (but does not disallow) CockroachDB pods from being scheduled onto nodes with the label topology.kubernetes.io/zone=us-east4-b. This achieves a similar effect as a PreferNoSchedule taint.

For more context on how these rules work, see the Kubernetes documentation. The custom resource definition details the fields supported by the operator.

Add a pod affinity or anti-affinity

Specify pod affinities and node anti-affinities in cockroachdb.crdbCluster.affinity.podAffinity and cockroachdb.crdbCluster.affinity.podAntiAffinity in the values file used to deploy the cluster. If you specify multiple matchExpressions labels, the node must match all of them. If you specify multiple values for a label, the node can match any of the values.

The CockroachDB operator hard-codes the pod template to only allow one pod per Kubernetes node. If you need to override this value, you can override the pod template.

The following configuration attempts to schedule CockroachDB pods in the same zones as the pods that run our example load generator app. It disallows CockroachDB pods from being co-located on the same worker node.

cockroachdb:
  crdbCluster:
    affinity:
      podAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - loadgen
            topologyKey: topology.kubernetes.io/zone
      podAntiAffinity:
        requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
            - key: app.kubernetes.io/instance
              operator: In
              values:
              - cockroachdb
          topologyKey: kubernetes.io/hostname

The preferredDuringSchedulingIgnoredDuringExecution pod affinity rule, using the In operator and specified weight, encourages (but does not require) CockroachDB pods to be co-located with pods labeled app=loadgen already running in the same zone, as specified with topologyKey.

The requiredDuringSchedulingIgnoredDuringExecution pod anti-affinity rule, using the In operator, requires CockroachDB pods not to be co-located on a worker node, as specified with topologyKey.

For more context on how these rules work, see the Kubernetes documentation. The custom resource definition details the fields supported by the operator.

Example: Scheduling CockroachDB onto labeled nodes

In this example, CockroachDB has not yet been deployed to a running Kubernetes cluster. Use a combination of node affinity and pod anti-affinity rules to schedule 3 CockroachDB pods onto three labeled worker nodes.

  1. List the worker nodes on the running Kubernetes cluster:

    icon/buttons/copy
    kubectl get nodes
    
    NAME                                         STATUS   ROLES    AGE   VERSION
    gke-cockroachdb-default-pool-263138a5-kp3v   Ready    <none>   3m56s   v1.20.10-gke.301
    gke-cockroachdb-default-pool-263138a5-nn62   Ready    <none>   3m56s   v1.20.10-gke.301
    gke-cockroachdb-default-pool-41796213-75c9   Ready    <none>   3m56s   v1.20.10-gke.301
    gke-cockroachdb-default-pool-41796213-bw3z   Ready    <none>   3m54s   v1.20.10-gke.301
    gke-cockroachdb-default-pool-ccd74623-dghs   Ready    <none>   3m54s   v1.20.10-gke.301
    gke-cockroachdb-default-pool-ccd74623-p5mf   Ready    <none>   3m55s   v1.20.10-gke.301
    
  2. Add a node=crdb label to three of the running worker nodes.

    icon/buttons/copy
    kubectl label nodes gke-cockroachdb-default-pool-263138a5-kp3v gke-cockroachdb-default-pool-41796213-75c9 gke-cockroachdb-default-pool-ccd74623-dghs node=crdb
    
    node/gke-cockroachdb-default-pool-5726e554-77r7 labeled
    node/gke-cockroachdb-default-pool-ee4d4d67-0922 labeled
    node/gke-cockroachdb-default-pool-ee4d4d67-w18b labeled
    

    In this example, 6 GKE nodes are deployed in 3 node pools, and each node pool resides in a separate availability zone. To maintain an even distribution of CockroachDB pods as specified in our topology recommendations, each of the 3 labeled worker nodes must belong to a different node pool.

    Note:

    This also ensures that the CockroachDB pods, which will be bound to persistent volumes in the same three availability zones, can be scheduled onto worker nodes in their respective zones.

  3. Add the following rules to the values file used to deploy the cluster:

    cockroachdb:
      crdbCluster:
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: node
                  operator: In
                  values:
                  - crdb
          podAntiAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                - key: app.kubernetes.io/instance
                  operator: In
                  values:
                  - cockroachdb
              topologyKey: kubernetes.io/hostname
    

    The nodeAffinity rule requires CockroachDB pods to be scheduled onto worker nodes with the label node=crdb. The podAntiAffinity rule requires CockroachDB pods not to be co-located on a worker node, as specified with topologyKey.

  4. Apply the settings to the cluster:

    icon/buttons/copy
    helm upgrade --reuse-values $CRDBCLUSTER ./cockroachdb-parent/charts/cockroachdb --values ./cockroachdb-parent/charts/cockroachdb/values.yaml -n $NAMESPACE
    
  5. The CockroachDB pods will be deployed to the 3 labeled nodes. To observe this, run:

    icon/buttons/copy
    kubectl get pods -o wide
    
    NAME                                 READY   STATUS    RESTARTS   AGE    IP           NODE                                         NOMINATED NODE   READINESS GATES
    cockroach-operator-bfdbfc9c7-tbpsw   1/1     Running   0          171m   10.32.2.4    gke-cockroachdb-default-pool-263138a5-kp3v   <none>           <none>
    cockroachdb-0                        1/1     Running   0          100s   10.32.4.10   gke-cockroachdb-default-pool-ccd74623-dghs   <none>           <none>
    cockroachdb-1                        1/1     Running   0          100s   10.32.2.6    gke-cockroachdb-default-pool-263138a5-kp3v   <none>           <none>
    cockroachdb-2                        1/1     Running   0          100s   10.32.0.5    gke-cockroachdb-default-pool-41796213-75c9   <none>           <none>
    

Taints and tolerations

When a taint is added to a Kubernetes worker node, pods are prevented from being scheduled onto that node. This effect is ignored by adding a toleration to a pod that specifies a matching taint.

Taints and tolerations are useful if you want to:

  • Prevent CockroachDB pods from being scheduled onto a labeled worker node.
  • Evict CockroachDB pods from a labeled worker node on which they are currently running.

For an example, see Evicting CockroachDB from a running worker node.

Add a toleration

Specify pod tolerations in the cockroachdb.crdbCluster.tolerations object in the values file used to deploy the cluster.

The following toleration matches a taint with the specified key, value, and NoSchedule effect, using the Equal operator. A toleration that uses the Equal operator must include a value field:

cockroachdb:
  crdbCluster:
    tolerations:
      - key: "test"
        operator: "Equal"
        value: "example"
        effect: "NoSchedule"

A NoSchedule taint on a node prevents pods from being scheduled onto the node. The matching toleration allows a pod to be scheduled onto the node. A NoSchedule toleration is therefore best included before deploying the cluster.

Note:

A PreferNoSchedule taint discourages, but does not disallow, pods from being scheduled onto the node.

The following toleration matches every taint with the specified key and NoExecute effect, using the Exists operator. A toleration that uses the Exists operator must exclude a value field:

cockroachdb:
  crdbCluster:
    tolerations:
      - key: "test"
        operator: "Exists"
        effect: "NoExecute"
        tolerationSeconds: 3600

A NoExecute taint on a node prevents pods from being scheduled onto the node, and evicts pods from the node if they are already running on the node. The matching toleration allows a pod to be scheduled onto the node, and to continue running on the node if tolerationSeconds is not specified. If tolerationSeconds is specified, the pod is evicted after this number of seconds.

For more information on using taints and tolerations, see the Kubernetes documentation. The custom resource definition details the fields supported by the operator.

Example: Evicting CockroachDB from a running worker node

In this example, CockroachDB has already been deployed on a Kubernetes cluster. Use the NoExecute effect to evict one of the CockroachDB pods from its worker node.

  1. List the worker nodes on the running Kubernetes cluster:

    icon/buttons/copy
    kubectl get nodes
    
    NAME                                         STATUS   ROLES    AGE   VERSION
    gke-cockroachdb-default-pool-4e5ce539-68p5   Ready    <none>   56m   v1.20.9-gke.1001
    gke-cockroachdb-default-pool-4e5ce539-j1h1   Ready    <none>   56m   v1.20.9-gke.1001
    gke-cockroachdb-default-pool-95fde00d-173d   Ready    <none>   56m   v1.20.9-gke.1001
    gke-cockroachdb-default-pool-95fde00d-hw04   Ready    <none>   56m   v1.20.9-gke.1001
    gke-cockroachdb-default-pool-eb2b2889-q15v   Ready    <none>   56m   v1.20.9-gke.1001
    gke-cockroachdb-default-pool-eb2b2889-q704   Ready    <none>   56m   v1.20.9-gke.1001
    
  2. Add a taint to a running worker node:

    icon/buttons/copy
    kubectl taint nodes gke-cockroachdb-default-pool-4e5ce539-j1h1 test=example:NoExecute
    
    node/gke-cockroachdb-default-pool-4e5ce539-j1h1 tainted
    
  3. Add a matching tolerations object in the values file used to deploy the cluster.

    cockroachdb:
      crdbCluster:
        tolerations:
          - key: "test"
            operator: "Exists"
            effect: "NoExecute"
    

    Because no tolerationSeconds is specified, CockroachDB will be evicted immediately from the tainted worker node.

  4. Apply the new settings to the cluster:

    icon/buttons/copy
    helm upgrade --reuse-values $CRDBCLUSTER ./cockroachdb-parent/charts/cockroachdb --values ./cockroachdb-parent/charts/cockroachdb/values.yaml -n $NAMESPACE
    
  5. The CockroachDB pod running on the tainted node (in this case, cockroachdb-2) will be evicted and started on a different worker node. To observe this:

    icon/buttons/copy
    kubectl get pods -o wide
    
    NAME                                 READY   STATUS    RESTARTS   AGE     IP           NODE                                         NOMINATED NODE   READINESS GATES
    cockroach-operator-c9fc6cb5c-bl6rs   1/1     Running   0          44m     10.32.2.4    gke-cockroachdb-default-pool-4e5ce539-68p5   <none>           <none>
    cockroachdb-0                        1/1     Running   0          9m21s   10.32.4.10   gke-cockroachdb-default-pool-95fde00d-173d   <none>           <none>
    cockroachdb-1                        1/1     Running   0          9m21s   10.32.2.6    gke-cockroachdb-default-pool-eb2b2889-q15v   <none>           <none>
    cockroachdb-2                        0/1     Running   0          6s      10.32.0.5    gke-cockroachdb-default-pool-4e5ce539-68p5   <none>           <none>
    

    cockroachdb-2 is now scheduled onto the gke-cockroachdb-default-pool-4e5ce539-68p5 node.

Topology spread constraints

A pod with a topology spread constraint must satisfy its conditions when being deployed to a given topology. This is used to control the degree to which pods are unevenly distributed across failure domains.

Add a topology spread constraint

Specify pod topology spread constraints in the cockroachdb.crdbCluster.topologySpreadConstraints object of the values file used to deploy the cluster. If you specify multiple topologySpreadConstraints objects, the matching pods must satisfy all of the constraints.

The following topology spread constraint ensures that CockroachDB pods deployed with the label environment=production will not be unevenly distributed across zones by more than 1 pod:

cockroachdb:
  crdbCluster:
    topologySpreadConstraints:
    - maxSkew: 1
      topologyKey: topology.kubernetes.io/zone
      whenUnsatisfiable: DoNotSchedule
      labelSelector:
        matchLabels:
          environment: production

The DoNotSchedule condition prevents labeled pods from being scheduled onto Kubernetes worker nodes when doing so would fail to meet the spread and topology constraints specified with maxSkew and topologyKey, respectively.

For more context on how these rules work, see the Kubernetes documentation. The custom resource definition details the fields supported by the operator.

Resource labels and annotations

To assist in working with your cluster, you can add labels and annotations to your resources.

Specify labels in cockroachdb.crdbCluster.podLabels and annotations in cockroachdb.crdbCluster.podAnnotations in the values file used to deploy the cluster:

cockroachdb:
  crdbCluster:
    podLabels:
      app.kubernetes.io/version: v25.1.4
    podAnnotations
      operator: https://raw.githubusercontent.com/cockroachdb/helm-charts/refs/heads/master/cockroachdb-parent/charts/cockroachdb/values.yaml

To verify that the labels and annotations were applied to a pod, for example, run kubectl describe pod {pod-name}.

For more information about labels and annotations, see the Kubernetes documentation.

×