Concepts

Bootstrapping

As part of the cluster management, the CrateDB Kubernetes Operator requires a CrateDB user named system with ALL privileges. The user will be created automatically upon creation of a new cluster. For that, the CrateDB Kubernetes Operator will use Kubernetes’ exec feature to run the bundled crash command inside one of the eligible master nodes. The user creation will fail gracefully if the system user already exists (for whatever reason). Additionally, the user will receive the ALL privileges, so they can then do anything in the cluster.

A user may specify a license in spec.cluster.license which should be loaded into the cluster. Since a cluster may be started with more nodes than allowed by the free license, the CrateDB Kubernetes Operator will attempt to set the license before trying to create the system user. Thus, license setting works the same way as described above: by execing into a master node’s crate container and running crash.

Finally, using the newly created system user, the CrateDB Kubernetes Operator will create all users specified under spec.users with their corresponding passwords and the ALL privileges. This it does by connecting to the cluster using the PostgreSQL protocol, and not by execing into a Kubernetes container.

The entire bootstrapping process may not take longer than (by default) 1800 seconds before it is considered failed. The timeout can be configured with the BOOTSTRAP_TIMEOUT environment variable.

User Passwords

When creating the CrateDB users specified under .spec.users, the operator will add the operator.cloud.crate.io/user-password label to each of the Kubernetes Secrets assigned to one of the users. To keep backward compatibility, it also adds the label to Kubernetes Secrets referenced in existing CrateDB resources on resume.

The operator.cloud.crate.io/user-password label is used to filter the events when watching for changes on one of the Kubernetes Secrets. If one of the Kubernetes Secrets is updated, the operator will update all CrateDB users that use that secret by iterating over all CrateDB resources. The operator updates the password in a CrateDB cluster by logging in to one of the CrateDB nodes with the corresponding username and old password. It will then use the ALTER USER query to update the password.

Note

If one changes the CrateDB user’s password directly in CrateDB, the operator won’t be able to update that user anymore.

Cluster Restart

Some situations, such as version upgrades, require a cluster restart. Usually a rolling restart is sufficient. When instructed to do so, the CrateDB Kubernetes Operator will perform a rolling cluster restart, following this process:

The rolling restart process for a CrateDB cluster.

Whenever the process calls for “Wait for some time”, a kopf.TemporaryError is raised, causing the Kopf sub-handler to be rescheduled and re-executed. That allows for the sub-handler to be called multiple times and allows for the operator to be restarted in the mean time, as it continues where it left of.

Cluster Scaling

From time to time it can be necessary to scale a cluster. At such times, the operator allows for scaling existing node definitions up and down. I.e., when a cluster contains 3 master nodes - 4 data nodes with name “hot” and 10 data nodes with name “cold” - each of them can be scaled up or down (with a minimum of 3 master nodes) as one pleases.

The scaling operation will follow this process which is explained in depth below:

The scaling process for a CrateDB cluster.
  1. First, all master nodes will be scaled. Whether up or down doesn’t matter.

    Since the master nodes do not have any data, scaling them is done fairly quickly. Getting this change out of the way first makes the rest of the process less complex.

  2. Next, all node definitions that are scaled up will be handled.

    This ensures that even when there are tables with loads of replicas in the cluster, there is less risk of too few nodes occurring if you are also scaling down other nodes. It also means that the process of moving data to the new nodes will start already, so when nodes are removed in step 3, some of their data will already have been transferred.

  3. Then the remaining node definitions that will be scaled down are taken care of.

    This step is a bit more complex than the previous ones: first the operator checks that none of the tables in the cluster use more replicas than there are nodes available in the cluster. This is to avoid underreplicated tables.

    Next, it will deallocate all shards from the nodes that will be turned off. Once the data has been moved, the corresponding nodes are turned off by decreasing the number of replicas in the Kubernetes StatefulSet.

  4. The last step for the operator is to reset all allocations and then acknowledge the gateway.expected_data_nodes node checks (ID 1).

Whenever the process calls for “Wait for some time”, a kopf.TemporaryError is raised, causing the Kopf sub-handler to be rescheduled and re-executed. That allows for the sub-handler to be called multiple times and allows for the operator to be restarted in the mean time, as it continues where it left of.

The entire scaling operation may not take longer than 3600 seconds by default, or no longer than whatever is configured in the SCALING_TIMEOUT environment variable.

Webhooks

Kubernetes follows an event-driven architecture. Depending on your use of the operator, it can be beneficial to receive notifications of when certain events occurred, such as a successful or failed cluster upgrade or scaling. By setting the environment variables WEBHOOK_PASSWORD, WEBHOOK_URL, and WEBHOOK_USERNAME to non-empty values, the operator will send HTTP POST requests to the provided URL. An example JSON payload is shown and documented below.

Important

The operator will not retry failed webhook notifications!

{
  "cluster": "my-new-crate-cluster",
  "event": "upgrade",
  "namespace": "my-crate-namespace",
  "scale_data": null,
  "status": "success",
  "upgrade_data": {
    "new_registry": "crate",
    "new_version": "4.1.6",
    "old_registry": "crate",
    "old_version": "4.1.5"
  },
}
cluster

The Kubernetes name (.metadata.name) of the cratedbs.cloud.crate.io resource.

event

Either 'scale' or 'upgrade'.

namespace

The Kubernetes namespace (.metadata.namespace) of the cratedbs.cloud.crate.io resource is deployed in.

scale_data

When event is 'scale', otherwise null.:

new_data_replicas

An array of objects, where each object has a name and a replicas key. The name corresponds to a node name (.spec.nodes.data.*.name), the replicas to the number of new replicas (.spec.nodes.data.*.replicas).

new_master_replicas

An optional number of replicas of new master nodes.

old_data_replicas

An array of objects, where each object has a name and a replicas key. The name corresponds to a node name (.spec.nodes.data.*.name), the replicas to the number of old replicas (.spec.nodes.data.*.replicas).

old_master_replicas

An optional number of replicas of old master nodes.

status

Either 'failure' or 'success'.

upgrade_data

When event is 'upgrade', otherwise null.:

old_registry

The old Docker image registry as defined in .spec.cluster.imageRegistry.

new_registry

The new Docker image registry as defined in .spec.cluster.imageRegistry.

old_version

The old image version (Docker tag) as defined in .spec.cluster.version.

new_version

The new image version (Docker tag) as defined in .spec.cluster.version.