This guide describes how to migrate an existing CockroachDB cluster managed via StatefulSet to the CockroachDB operator.
The CockroachDB operator is in Preview.
These instructions assume that you are migrating from a StatefulSet cluster that was configured using the Helm chart with the following command:
helm upgrade --install --set operator.enabled=false crdb-test --debug ./cockroachdb
If your existing cluster was created using the Public operator, refer to the Public operator migration guide.
This migration can be completed without affecting cluster availability, and preserves existing disks so that data doesn't need to be replicated into empty volumes. The process scales down the StatefulSet by one node before adding each operator-managed pod, so the maximum cluster capacity will be reduced by one node periodically throughout the migration.
Commands that use RPCs (such as cockroach node drain
and cockroach node decommission
) will be unavailable until the public service is updated in step 4. The CockroachDB operator uses a different port than StatefulSets for RPC services, causing these commands to fail for a limited time.
Step 1. Prepare the migration helper
In the root of the cockroachdb/helm-charts repository, build the migration helper and add the ./bin
directory to your PATH:
make bin/migration-helper
export PATH=$PATH:$(pwd)/bin
Export environment variables for the existing deployment:
Set STS_NAME to the cockroachdb statefulset deployed via helm chart:
export STS_NAME="crdb-example-cockroachdb"
Set NAMESPACE to the namespace where the statefulset is installed:
export NAMESPACE="default"
Set CLOUD_PROVIDER to the cloud vendor where Kubernetes cluster is residing. All major cloud providers are supported (gcp, aws, azure):
export CLOUD_PROVIDER=gcp
Set REGION to the cloud provider's identifier of this region. This region must match the "topology.kubernetes.io/region" label in the Kubernetes nodes for this cluster:
export REGION=us-central1
Step 2. Generate manifests with the migration helper
The operator uses slightly different certificates than the CockroachDB Helm chart, and mounts them in configmaps and secrets with different names. Use the migration helper utility with the migrate-certs
option to re-map and generate TLS certificates:
bin/migration-helper migrate-certs --statefulset-name $STS_NAME --namespace $NAMESPACE
Generate a manifest for each crdbnode and the crdbcluster based on the state of the StatefulSet. The new pods and their associated PVCs must have the same names as the original StatefulSet-managed pods and PVCs. The new operator-managed pods will then use the original PVCs, rather than replicate data into empty nodes.
mkdir -p manifests
bin/migration-helper build-manifest helm --statefulset-name $STS_NAME --namespace $NAMESPACE --cloud-provider $CLOUD_PROVIDER --cloud-region $REGION --output-dir ./manifests
Step 3. Replace statefulset pods with operator nodes
To migrate seamlessly from the CockroachDB Helm chart to the operator, scale down StatefulSet-managed pods and replace them with crdbnode objects, one by one. Then we’ll create the crdbcluster object that manages the crdbnodes.
Create objects with kubectl
that will eventually be owned by the crdbcluster:
kubectl create priorityclass crdb-critical --value 500000000
Install the crdb-operator
with Helm:
helm upgrade --install crdb-operator ./cockroachdb-parent/charts/operator
For each pod in the StatefulSet, perform the following steps:
Scale the StatefulSet down by one replica. For example, for a five-node cluster, scale the StatefulSet down to four replicas:
kubectl scale statefulset/$STS_NAME --replicas=4
Create the
crdbnode
resource that corresponds to the StatefulSet pod you just scaled down. Each manifest is labeled with the patterncrdbnode-X.yaml
, whereX
corresponds to a StatefulSet pod named{STS_NAME}-X
. Note the pod that was scaled down and specify its manifest in a command like the following:kubectl apply -f manifests/crdbnode-4.yaml
Wait for the new pod to become ready. If it doesn’t, check the operator logs for errors.
Before moving on to the next replica migration, verify that there are no underreplicated ranges:
Set up port forwarding to access the CockroachDB node’s HTTP interface. Note that the DB Console runs on port 8080 by default:
kubectl port-forward pod/cockroachdb-4 8080:8080
Check that there are zero underreplicated ranges. The following command outputs the number of under-replicated ranges on this CockroachDB node:
curl --insecure -s https://localhost:8080/_status/vars | grep "ranges_underreplicated{" | awk '{print $2}'
Repeat these steps until the StatefulSet has zero replicas.
If there are issues with the migration and you need to revert back to the previous deployment, follow the rollback process.
Step 4. Update the public service
The Helm chart creates a public Service that exposes both SQL and gRPC connections over a single power. However, the operator uses a different port for gRPC communication. To ensure compatibility, update the public Service to reflect the correct gRPC port used by the operator.
Apply the updated Service manifest:
kubectl apply -f manifests/public-service.yaml
The existing StatefulSet creates a PodDisruptionBudget (PDB) that conflicts with the one managed by the operator. To avoid this conflict, delete the existing PDB:
kubectl delete poddisruptionbudget $STS_NAME-budget
Step 5. Deploy the crdbcluster object
Delete the StatefulSet that was scaled down to zero, as the Helm upgrade can only proceed if no StatefulSet is present:
kubectl delete statefulset $STS_NAME
Apply the crdbcluster manifest using Helm:
helm upgrade $RELEASE_NAME ./cockroachdb-parent/charts/cockroachdb -f manifests/values.yaml
Roll back a migration in progress
If the migration to the CockroachDB operator fails during the stage where you are applying the generated crdbnode
manifests, follow the steps below to safely restore the original state using the previously backed-up resources and preserved volumes. This assumes the StatefulSet and PVCs are not deleted.
Delete the applied
crdbnode
resources and simultaneously scale the StatefulSet back up.Delete the individual
crdbnode
manifests in the reverse order of their creation (starting with the last one created, e.g.,crdbnode-1.yaml
) and scale the StatefulSet back to its original replica count (e.g., 2). For example, assuming you have applied twocrdbnode
yaml files (crdbnode-2.yaml
andcrdbnode-1.yaml
):Delete a
crdbnode
manifest in reverse order, starting withcrdbnode-1.yaml
.kubectl delete -f manifests/crdbnode-1.yaml
Scale the StatefulSet replica count up by one (to 2).
kubectl scale statefulset $CRDBCLUSTER --replicas=2
Verify that data has propagated by waiting for there to be zero under-replicated ranges:
Set up port forwarding to access the CockroachDB node's HTTP interface, replacing
cockroachdb-X
with the node name:kubectl port-forward pod/cockroachdb-X 8080:8080
The DB Console runs on port 8080 by default.
Check the
ranges_underreplicated
metric:curl --insecure -s https://localhost:8080/_status/vars | grep "ranges_underreplicated{" | awk ' {print $2}'
This command outputs the number of under-replicated ranges on the node, which should be zero before proceeding with the next node. This may take some time depending on the deployment, but is necessary to ensure that there is no downtime in data availability.
Repeat steps a through c for each node, deleting the
crdbnode-2.yaml
, scaling replica count to 3, and so on.Repeat the
kubectl delete -f ... command
for eachcrdbnode
manifest you applied during migration. Make sure to verify that there are no underreplicated ranges after rolling back each node.
Delete the PriorityClass and RBAC resources created for the CockroachDB operator:
kubectl delete priorityclass crdb-critical kubectl delete -f manifests/rbac.yaml
Uninstall the CockroachDB operator:
helm uninstall crdb-operator
Clean up CockroachDB operator resources and custom resource definitions:
kubectl delete crds crdbnodes.crdb.cockroachlabs.com kubectl delete crds crdbtenants.crdb.cockroachlabs.com kubectl delete serviceaccount cockroachdb-sa kubectl delete service cockroach-webhook-service kubectl delete validatingwebhookconfiguration cockroach-webhook-config
Confirm that all CockroachDB pods are "Running" or "Ready" as shown with the following command:
kubectl get pods