This guide describes how to migrate an existing CockroachDB cluster managed via the Public operator to the CockroachDB operator.
The CockroachDB operator is in Preview.
These instructions assume that you are migrating from a Public operator cluster that is managed with kubectl via the following yaml files:
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/crds.yaml
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/operator.yaml
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/examples/example.yaml
If your existing cluster was created as a StatefulSet using Helm, refer to the Helm migration guide.
This migration process can be completed without affecting cluster availability, and preserves existing disks so that data doesn’t need to be replicated into empty volumes. This process scales down the StatefulSet by one node before adding each operator-managed pod, so the maximum cluster capacity will be reduced by one node periodically throughout the migration.
Step 1. Prepare the migration helper
In the root of the cockroachdb/helm-charts repository, build the migration helper and add the ./bin
directory to your PATH:
make bin/migration-helper
export PATH=$PATH:$(pwd)/bin
Export environment variables for the existing deployment:
Set CRDBCLUSTER to the crdbcluster custom resource name in the Public operator:
export CRDBCLUSTER="cockroachdb"
Set NAMESPACE to the namespace where the statefulset is installed:
export NAMESPACE="default"
Set CLOUD_PROVIDER to the cloud vendor where Kubernetes cluster is residing. All major cloud providers are supported (gcp, aws, azure):
export CLOUD_PROVIDER=gcp
Set REGION to the cloud provider's identifier of this region. This region must match the "topology.kubernetes.io/region" label in the Kubernetes nodes for this cluster:
export REGION=us-central1
Back up the crdbcluster resource in case there is a need to revert:
mkdir -p backup
kubectl get crdbcluster -o yaml $CRDBCLUSTER > backup/crdbcluster-$CRDBCLUSTER.yaml
Step 2. Generate manifests with the migration helper
The CockroachDB operator uses slightly different certificates than the Public operator, and mounts them in configmaps and secrets with different names. Use the migration helper utility with the migrate-certs
option to re-map and generate TLS certificates:
bin/migration-helper migrate-certs --statefulset-name $STS_NAME --namespace $NAMESPACE
Generate a manifest for each crdbnode and the crdbcluster based on the state of the StatefulSet. The new pods and their associated PVCs must have the same names as the original StatefulSet-managed pods and PVCs. The new CockroachDB operator-managed pods will then use the original PVCs, rather than replicate data into empty nodes.
mkdir -p manifests
bin/migration-helper build-manifest helm --statefulset-name $STS_NAME --namespace $NAMESPACE --cloud-provider $CLOUD_PROVIDER --cloud-region $REGION --output-dir ./manifests
Step 3. Uninstall and replace the Public operator
The Public operator and the CockroachDB operator use custom resource definitions with the same names, so you must remove the Public operator before installing the CockroachDB operator. Run the following commands to uninstall the Public operator, without deleting its managed resources:
Ensure that the operator can't accidentally delete managed Kubernetes objects:
kubectl delete clusterrolebinding cockroach-operator-rolebinding
Delete the Public operator custom resource:
kubectl delete crdbcluster $CRDBCLUSTER --cascade=orphan
The
--cascade=orphan
flag tells Kubernetes not to delete the dependent resources (StatefulSets, Services, PVCs, etc.) created by theCrdbCluster
custom resource. This ensures that only the parent custom resource is deleted, while child resources are left intact in the cluster. This allows the CockroachDB cluster to continue running as a StatefulSet until the migration is complete.Delete Public operator resources and custom resource definition:
kubectl delete -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/crds.yaml kubectl delete serviceaccount cockroach-operator-sa -n cockroach-operator-system kubectl delete clusterrole cockroach-operator-role kubectl delete clusterrolebinding cockroach-operator-rolebinding kubectl delete service cockroach-operator-webhook-service -n cockroach-operator-system kubectl delete deployment cockroach-operator-manager -n cockroach-operator-system kubectl delete mutatingwebhookconfigurations cockroach-operator-mutating-webhook-configuration kubectl delete validatingwebhookconfigurations cockroach-operator-validating-webhook-configuration
Run helm upgrade
to install the CockroachDB operator and wait for it to become ready:
helm upgrade --install crdb-operator ./cockroachdb-parent/charts/operator
kubectl rollout status deployment/cockroach-operator --timeout=60s
Step 4. Replace statefulset pods with operator-managed nodes
To migrate seamlessly from the Public operator to the CockroachDB operator, scale down StatefulSet-managed pods and replace them with crdbnode objects, one by one. Then we’ll create the crdbcluster object that manages the crdbnodes.
Create objects with kubectl
that will eventually be owned by the crdbcluster:
kubectl create priorityclass crdb-critical --value 500000000
kubectl apply -f manifests/rbac.yaml
Install the crdb-operator
with Helm:
helm upgrade --install crdb-operator ./cockroachdb-parent/charts/operator
For each pod in the StatefulSet, perform the following steps:
Scale the StatefulSet down by one replica. For example, for a five-node cluster, scale the StatefulSet down to four replicas:
kubectl scale statefulset/$STS_NAME --replicas=4
Create the
crdbnode
resource that corresponds to the StatefulSet pod you just scaled down. Each manifest is labeled with the patterncrdbnode-X.yaml
, whereX
corresponds to a StatefulSet pod named{STS_NAME}-X
. Note the pod that was scaled down and specify its manifest in a command like the following:kubectl apply -f manifests/crdbnode-4.yaml
Wait for the new pod to become ready. If it doesn’t, check the operator logs for errors.
Before moving on to the next replica migration, verify that there are no underreplicated ranges:
Set up port forwarding to access the CockroachDB node’s HTTP interface. Note that the DB Console runs on port 8080 by default:
kubectl port-forward pod/cockroachdb-4 8080:8080
Check that there are zero underreplicated ranges. The following command outputs the number of under-replicated ranges on this CockroachDB node:
curl --insecure -s https://localhost:8080/_status/vars | grep "ranges_underreplicated{" | awk '{print $2}'
Repeat these steps until the StatefulSet has zero replicas.
If there are issues with the migration and you need to revert back to the previous deployment, follow the rollback process.
Step 5. Update the crdbcluster manifest
The Public operator creates a pod disruption budget that conflicts with a pod disruption budget managed by the CockroachDB operator. Before applying the crdbcluster manifest, delete the existing pod disruption budget:
kubectl delete poddisruptionbudget $CRDBCLUSTER
Annotate the existing Kubernetes objects so they can managed by the Helm chart:
kubectl annotate service $CRDBCLUSTER-public meta.helm.sh/release-name="$CRDBCLUSTER"
kubectl annotate service $CRDBCLUSTER-public meta.helm.sh/release-namespace="$NAMESPACE"
kubectl label service $CRDBCLUSTER-public app.kubernetes.io/managed-by=Helm --overwrite=true
Apply the crdbcluster manifest:
helm install $CRDBCLUSTER ./cockroachdb-parent/charts/cockroachdb -f manifests/values.yaml
Once the migration is successful, delete the StatefulSet that was created by the Public operator:
kubectl delete poddisruptionbudget $STS_NAME-budget
Roll back a migration in progress
If the migration to the CockroachDB operator fails during the stage where you are applying the generated crdbnode
manifests, follow the steps below to safely restore the original state using the previously backed-up resources and preserved volumes. This assumes the StatefulSet and PVCs are not deleted.
Delete the applied
crdbnode
resources and simultaneously scale the StatefulSet back up.Delete the individual
crdbnode
manifests in the reverse order of their creation (starting with the last one created, e.g.,crdbnode-1.yaml
) and scale the StatefulSet back to its original replica count (e.g., 2). For example, assuming you have applied twocrdbnode
yaml files (crdbnode-2.yaml
andcrdbnode-1.yaml
):Delete a
crdbnode
manifest in reverse order, starting withcrdbnode-1.yaml
.kubectl delete -f manifests/crdbnode-1.yaml
Scale the StatefulSet replica count up by one (to 2).
kubectl scale statefulset $CRDBCLUSTER --replicas=2
Verify that data has propagated by waiting for there to be zero under-replicated ranges:
Set up port forwarding to access the CockroachDB node's HTTP interface, replacing
cockroachdb-X
with the node name:kubectl port-forward pod/cockroachdb-X 8080:8080
The DB Console runs on port 8080 by default.
Check the
ranges_underreplicated
metric:curl --insecure -s https://localhost:8080/_status/vars | grep "ranges_underreplicated{" | awk ' {print $2}'
This command outputs the number of under-replicated ranges on the node, which should be zero before proceeding with the next node. This may take some time depending on the deployment, but is necessary to ensure that there is no downtime in data availability.
Repeat steps a through c for each node, deleting the
crdbnode-2.yaml
, scaling replica count to 3, and so on.Repeat the
kubectl delete -f ... command
for eachcrdbnode
manifest you applied during migration. Make sure to verify that there are no underreplicated ranges after rolling back each node.
Delete the PriorityClass and RBAC resources created for the CockroachDB operator:
kubectl delete priorityclass crdb-critical kubectl delete -f manifests/rbac.yaml
Uninstall the CockroachDB operator:
helm uninstall crdb-operator
Clean up CockroachDB operator resources and custom resource definitions:
kubectl delete crds crdbnodes.crdb.cockroachlabs.com kubectl delete crds crdbtenants.crdb.cockroachlabs.com kubectl delete serviceaccount cockroachdb-sa kubectl delete service cockroach-webhook-service kubectl delete validatingwebhookconfiguration cockroach-webhook-config
Restore the Public operator:
kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/crds.yaml kubectl apply -f https://raw.githubusercontent.com/cockroachdb/cockroach-operator/v2.17.0/install/operator.yaml
Wait for the operator pod to be "Running" as shown with the following command:
kubectl get pods -n cockroach-operator-system
Restore the original
crdbcluster
custom resource:kubectl apply -f backup/crdbcluster-$CRDBCLUSTER.yaml
Confirm that all CockroachDB pods are "Running" or "Ready" as shown with the following command:
kubectl get pods