Known issues¶

This section lists known issues with workarounds for the Mirantis Container Cloud release 2.23.0 including the Cluster release 11.7.0.

For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.

Note

This section also outlines still valid known issues from previous Container Cloud releases.

Bare metal
LCM
TLS configuration
Ceph
StackLight

Bare metal¶

[29762] Wrong IP address is assigned after the MetalLB controller restart¶

Fixed in 14.0.0(1) and 15.0.1

Due to the upstream MetalLB issue, a race condition occurs when assigning an IP address after the MetalLB controller restart. If a new service of the LoadBalancer type is created during the MetalLB Controller restart, then this service can be assigned an IP address that was already assigned to another service before the MetalLB Controller restart.

To verify that the cluster is affected:

Verify whether IP addresses of the LoadBalancer (LB) type are duplicated where they are not supposed to:

kubectl get svc -A|grep LoadBalancer

Note

Some services use shared IP addresses on purpose. In the example system response below, these are services using the IP address 10.0.1.141.

Example system response:

kaas        dhcp-lb                   LoadBalancer  10.233.4.192   10.0.1.141      53:32594/UDP,67:30048/UDP,68:30464/UDP,69:31898/UDP,123:32450/UDP  13h
kaas        dhcp-lb-tcp               LoadBalancer  10.233.6.79    10.0.1.141      8080:31796/TCP,53:32012/TCP                                        11h
kaas        httpd-http                LoadBalancer  10.233.0.92    10.0.1.141      80:30115/TCP                                                       13h
kaas        iam-keycloak-http         LoadBalancer  10.233.55.2    10.100.91.101   443:30858/TCP,9990:32301/TCP                                       2h
kaas        ironic-kaas-bm            LoadBalancer  10.233.26.176  10.0.1.141      6385:31748/TCP,8089:30604/TCP,5050:32200/TCP,9797:31988/TCP,601:31888/TCP 13h
kaas        ironic-syslog             LoadBalancer  10.233.59.199  10.0.1.141      514:32098/UDP                                                      13h
kaas        kaas-kaas-ui              LoadBalancer  10.233.51.167  10.100.91.101   443:30976/TCP                                                      13h
kaas        mcc-cache                 LoadBalancer  10.233.40.68   10.100.91.102   80:32278/TCP,443:32462/TCP                                         12h
kaas        mcc-cache-pxe             LoadBalancer  10.233.10.75   10.0.1.142      80:30112/TCP,443:31559/TCP                                         12h
stacklight  iam-proxy-alerta          LoadBalancer  10.233.4.102   10.100.91.104   443:30101/TCP                                                      12h
stacklight  iam-proxy-alertmanager    LoadBalancer  10.233.46.45   10.100.91.105   443:30944/TCP                                                      12h
stacklight  iam-proxy-grafana         LoadBalancer  10.233.39.24   10.100.91.106   443:30953/TCP                                                      12h
stacklight  iam-proxy-prometheus      LoadBalancer  10.233.12.174  10.100.91.107   443:31300/TCP                                                      12h
stacklight  telemeter-server-external LoadBalancer  10.233.56.63   10.100.91.103   443:30582/TCP                                                      12h

In the above example, the iam-keycloak-http and kaas-kaas-ui services erroneously use the same IP address 10.100.91.101. They both use the same port 443 producing a collision when an application tries to access the 10.100.91.101:443 endpoint.

Workaround:

Unassign the current LB IP address for the selected service, as no LB IP address can be used for the NodePort service:
```
kubectl -n kaas patch svc <serviceName> -p '{"spec":{"type":"NodePort"}}'
```
Assign a new LB IP address for the selected service:
```
kubectl -n kaas patch svc <serviceName> -p '{"spec":{"type":"LoadBalancer"}}'
```
The second affected service will continue using its current LB IP address.

[24005] Deletion of a node with ironic Pod is stuck in the Terminating state¶

During deletion of a manager machine running the ironic Pod from a bare metal management cluster, the following problems occur:

All Pods are stuck in the Terminating state
A new ironic Pod fails to start
The related bare metal host is stuck in the deprovisioning state

As a workaround, before deletion of the node running the ironic Pod, cordon and drain the node using the kubectl cordon <nodeName> and kubectl drain <nodeName> commands.

[20736] Region deletion failure after regional deployment failure¶

If a baremetal-based regional cluster deployment fails before pivoting is done, the corresponding region deletion fails.

Workaround:

Using the command below, manually delete all possible traces of the failed regional cluster deployment, including but not limited to the following objects that contain the kaas.mirantis.com/region label of the affected region:

cluster
machine
baremetalhost
baremetalhostprofile
l2template
subnet
ipamhost
ipaddr

kubectl delete <objectName> -l kaas.mirantis.com/region=<regionName>

Warning

Do not use the same region name again after the regional cluster deployment failure since some objects that reference the region name may still exist.

LCM¶

[5981] Upgrade gets stuck on the cluster with more that 120 nodes¶

Fixed in 14.0.0(1) and 15.0.1

Upgrade of a cluster with more than 120 nodes gets stuck with errors about IP addresses exhaustion in the docker logs.

Note

If you plan to scale your cluster to more than 120 nodes, the cluster will be affected by the issue. Therefore, you will have to perform the workaround below.

Workaround:

Caution

If you have not run the cluster upgrade yet, simply recreate the mke-overlay network as described in the step 6 and skip all other steps.

Note

If you successfully upgraded the cluster with less than 120 nodes but plan to scale it to more that 120 node, proceed with steps 2-9.

Verify that MKE nodes are upgraded:

On any master node, run the following command to identify ucp-worker-agent that has a newer version:

docker service ls

Example of system response:

ID             NAME                     MODE         REPLICAS   IMAGE                          PORTS
7jdl9m0giuso   ucp-3-5-7                global       0/0        mirantis/ucp:3.5.7
uloi2ixrd0br   ucp-auth-api             global       3/3        mirantis/ucp-auth:3.5.7
pfub4xa17nkb   ucp-auth-worker          global       3/3        mirantis/ucp-auth:3.5.7
00w1kqn0x69w   ucp-cluster-agent        replicated   1/1        mirantis/ucp-agent:3.5.7
xjhwv1vrw9k5   ucp-kube-proxy-win       global       0/0        mirantis/ucp-agent-win:3.5.7
oz28q8a7swmo   ucp-kubelet-win          global       0/0        mirantis/ucp-agent-win:3.5.7
ssjwonmnvk3s   ucp-manager-agent        global       3/3        mirantis/ucp-agent:3.5.7
ks0ttzydkxmh   ucp-pod-cleaner-win      global       0/0        mirantis/ucp-agent-win:3.5.7
w5d25qgneibv   ucp-tigera-felix-win     global       0/0        mirantis/ucp-agent-win:3.5.7
ni86z33o10n3   ucp-tigera-node-win      global       0/0        mirantis/ucp-agent-win:3.5.7
iyyh1f0z6ejc   ucp-worker-agent-win-x   global       0/0        mirantis/ucp-agent-win:3.5.5
5z6ew4fmf2mm   ucp-worker-agent-win-y   global       0/0        mirantis/ucp-agent-win:3.5.7
gr52h05hcwwn   ucp-worker-agent-x       global       56/56      mirantis/ucp-agent:3.5.5
e8coi9bx2j7j   ucp-worker-agent-y       global       121/121    mirantis/ucp-agent:3.5.7

In the above example, it is ucp-worker-agent-y.

Obtain the node list:

docker service ps ucp-worker-agent-y | awk -F ' ' ‘$4 ~ /^kaas/ {print $4}’ > upgraded_nodes.txt

Identify the cluster ID. For example, run the following command on the management cluster:

kubectl -n <clusterNamespace> get cluster <clusterName> -o json | jq '.status.providerStatus.mke.clusterID'

Create a backup of MKE as described in the MKE documentation: Backup procedure.

Remove MKE services:

docker service rm ucp-cluster-agent ucp-manager-agent ucp-worker-agent-win-y ucp-worker-agent-y ucp-worker-agent-win-x ucp-worker-agent-x

Remove the mke-overlay network:
```
docker network rm mke-overlay
```
Recreate the mke-overlay network with a correct CIDR that must be at least /20 and have no interventions with other subnets in the cluster network. For example:
```
docker network create -d overlay --subnet 10.1.0.0/20 mke-overlay
```

Create placeholder worker services:

docker service create --name ucp-worker-agent-x --mode global --constraint node.labels.foo==bar --detach busybox sleep 3d

docker service create --name ucp-worker-agent-win-x --mode global --constraint node.labels.foo==bar --detach busybox sleep 3d

Recreate all MKE services using the previously obtained cluster ID. Use the target version for your cluster, for example, 3.5.7:

docker container run --rm -it --name ucp -v /var/run/docker.sock:/var/run/docker.sock mirantis/ucp:3.5.7 upgrade --debug --manual-worker-upgrade --force-minimums --id <cluster ID> --interactive --force-port-check

Note

Because of interactive mode, you may need to use Ctrl+C when the command execution completes.

Verify that all services are recreated:
```
docker service ls
```
The exemplary ucp-worker-agent-y service must have 1 replica running with a node that was previously stuck.
Using the node list obtained in the first step, remove the upgrade-hold labels from the nodes that were previously upgraded:
```
for i in $(cat upgraded_nodes.txt); do docker node update --label-rm com.docker.ucp.upgrade-hold $i; done
```
Verify that all nodes from the list obtained in the first step are present in the ucp-worker-agent-y service. For example:
```
docker service ps ucp-worker-agent-y
```

[5782] Manager machine fails to be deployed during node replacement¶

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a manager machine, the following problems may occur:

The system adds the node to Docker swarm but not to Kubernetes
The node Deployment gets stuck with failed RethinkDB health checks

Workaround:

Delete the failed node.
Wait for the MKE cluster to become healthy. To monitor the cluster status:
1. Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.
2. Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.
Deploy a new node.

[5568] The calico-kube-controllers Pod fails to clean up resources¶

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During the unsafe or forced deletion of a manager machine running the calico-kube-controllers Pod in the kube-system namespace, the following issues occur:

The calico-kube-controllers Pod fails to clean up resources associated with the deleted node
The calico-node Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had

As a workaround, before deletion of the node running the calico-kube-controllers Pod, cordon and drain the node:

kubectl cordon <nodeName>
kubectl drain <nodeName>

[30294] Replacement of a master node is stuck on the calico-node Pod start¶

Fixed in 2.28.4 (17.3.4 and 16.3.4)

During replacement of a master node on a cluster of any type, the calico-node Pod fails to start on a new node that has the same IP address as the node being replaced.

Workaround:

From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the mirantis/ucp-dsinfo image:

Since MKE 3.7.2

alias calicoctl="\
docker run -i --rm \
--pid host \
--net host \
-e constraint:ostype==linux \
-e ETCD_ENDPOINTS=<etcdEndpoint> \
-e ETCD_KEY_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/key.pem \
-e ETCD_CA_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/ca.pem \
-e ETCD_CERT_FILE=/var/lib/docker/volumes/ucp-kv-certs/_data/cert.pem \
-v /var/run/calico:/var/run/calico \
-v /var/lib/docker/volumes/ucp-kv-certs/_data:/var/lib/docker/volumes/ucp-kv-certs/_data:ro \
mirantis/ucp-dsinfo:<mkeVersion> \
calicoctl \
"

Before MKE 3.7.2

alias calicoctl="\
docker run -i --rm \
--pid host \
--net host \
-e constraint:ostype==linux \
-e ETCD_ENDPOINTS=<etcdEndpoint> \
-e ETCD_KEY_FILE=/ucp-node-certs/key.pem \
-e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \
-e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \
-v /var/run/calico:/var/run/calico \
-v ucp-node-certs:/ucp-node-certs:ro \
mirantis/ucp-dsinfo:<mkeVersion> \
calicoctl --allow-version-mismatch \
"

In the above command, replace the following values with the corresponding settings of the affected cluster:

<etcdEndpoint> is the etcd endpoint defined in the Calico configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378
<mkeVersion> is the MKE version installed on your cluster. For example, mirantis/ucp-dsinfo:3.5.7.

Verify the node list on the cluster:
```
kubectl get node
```
Compare this list with the node list in Calico to identify the old node:
```
calicoctl get node -o wide
```

Remove the old node from Calico:

calicoctl delete node kaas-node-<nodeID>

[27797] A cluster ‘kubeconfig’ stops working during MKE minor version update¶

During update of a Container Cloud cluster of any type, if the MKE minor version is updated from 3.4.x to 3.5.x, access to the cluster using the existing kubeconfig fails with the You must be logged in to the server (Unauthorized) error due to OIDC settings being reconfigured.

As a workaround, during the cluster update process, use the admin kubeconfig instead of the existing one. Once the update completes, you can use the existing cluster kubeconfig again.

To obtain the admin kubeconfig:

kubectl --kubeconfig <pathToMgmtKubeconfig> get secret -n <affectedClusterNamespace> \
-o yaml <affectedClusterName>-kubeconfig | awk '/admin.conf/ {print $2}' | \
head -1 | base64 -d > clusterKubeconfig.yaml

If the related cluster is regional, replace <pathToMgmtKubeconfig> with <pathToRegionalKubeconfig>.

TLS configuration¶

[29604] The ‘failed to get kubeconfig’ error during TLS configuration¶

Fixed in 14.0.0(1) and 15.0.1

When setting a new Transport Layer Security (TLS) certificate for a cluster, the false positive failed to get kubeconfig error may occur on the Waiting for TLS settings to be applied stage. No actions are required. Therefore, disregard the error.

To verify the status of the TLS configuration being applied:

kubectl get cluster <ClusterName> -n <ClusterProjectName> -o jsonpath-as-json="{.status.providerStatus.tls.<Application>}"

Possible values for the <Application> parameter are as follows:

keycloak
ui
cache
mke
iamProxyAlerta
iamProxyAlertManager
iamProxyGrafana
iamProxyKibana
iamProxyPrometheus

Example of system response:

[
    {
        "expirationTime": "2024-01-06T09:37:04Z",
        "hostname": "domain.com",
    }
]

In this example, expirationTime equals the NotAfter field of the server certificate. And the value of hostname contains the configured application name.

Ceph¶

[30857] Irrelevant error during Ceph OSD deployment on removable devices¶

Fixed in 14.0.0(1) and 15.0.1

The deployment of Ceph OSDs fails with the following messages in the status section of the KaaSCephCluster custom resource:

shortClusterInfo:
  messages:
  - Not all osds are deployed
  - Not all osds are in
  - Not all osds are up

To find out if your cluster is affected, verify if the devices on the AMD hosts you use for the Ceph OSDs deployment are removable. For example, if the sdb device name is specified in spec.cephClusterSpec.nodes.storageDevices of the KaaSCephCluster custom resource for the affected host, run:

# cat /sys/block/sdb/removable
1

The system output shows that the reason of the above messages in status is the enabled hotplug functionality on the AMD nodes, which marks all drives as removable. And the hotplug functionality is not supported by Ceph in Container Cloud.

As a workaround, disable the hotplug functionality in the BIOS settings for disks that are configured to be used as Ceph OSD data devices.

[30635] Ceph ‘pg_autoscaler’ is stuck with the ‘overlapping roots’ error¶

Fixed in 14.0.0(1) and 15.0.1

Due to the upstream Ceph issue occurring since Ceph Pacific, the pg_autoscaler module of Ceph Manager fails with the pool <poolNumber> has overlapping roots error if a Ceph cluster contains a mix of pools with deviceClass either explicitly specified or not specified.

The deviceClass parameter is required for a pool definition in the spec section of the KaaSCephCluster object, but not required for Ceph RADOS Gateway (RGW) and Ceph File System (CephFS). Therefore, if sections for Ceph RGW or CephFS data or metadata pools are defined without deviceClass, then autoscaling of placement groups is disabled on a cluster due to overlapping roots. Overlapping roots imply that Ceph RGW and/or CephFS pools obtained the default crush rule and have no demarcation on a specific class to store data.

Note

If pools for Ceph RGW and CephFS already have deviceClass specified, skip the corresponding steps of the below procedure.

Note

Perform the below procedure on the affected managed cluster using its kubeconfig.

Workaround:

Obtain failureDomain and required replicas for Ceph RGW and/or CephFS pools:

Note

If the KaasCephCluster spec section does not contain failureDomain, failureDomain equals host by default to store one replica per node.
Note

The types of pools crush rules include:
- An erasureCoded pool requires the codingChunks + dataChunks number of available units of failureDomain.
- A replicated pool requires the replicated.size number of available units of failureDomain.
- To obtain Ceph RGW pools, use the spec.cephClusterSpec.objectStorage.rgw section of the KaaSCephCluster object. For example:
```
objectStorage:
  rgw:
    dataPool:
      failureDomain: host
      erasureCoded:
        codingChunks: 1
        dataChunks: 2
    metadataPool:
      failureDomain: host
      replicated:
        size: 3
    gateway:
      allNodes: false
      instances: 3
      port: 80
      securePort: 8443
    name: openstack-store
    preservePoolsOnDelete: false
```
  The dataPool pool requires the sum of codingChunks and dataChunks values representing the number of available units of failureDomain. In the example above, for failureDomain: host, dataPool requires 3 available nodes to store its objects.
  
  The metadataPool pool requires the replicated.size number of available units of failureDomain. For failureDomain: host, metadataPool requires 3 available nodes to store its objects.
- To obtain CephFS pools, use the spec.cephClusterSpec.sharedFilesystem.cephFS section of the KaaSCephCluster object. For example:
```
sharedFilesystem:
  cephFS:
  - name: cephfs-store
    dataPools:
    - name: default-pool
      replicated:
        size: 3
      failureDomain: host
    - name: second-pool
      erasureCoded:
        dataChunks: 2
        codingChunks: 1
    metadataPool:
      replicated:
        size: 3
      failureDomain: host
    ...
```
  The default-pool and metadataPool pools require the replicated.size number of available units of failureDomain. For failureDomain: host, default-pool requires 3 available nodes to store its objects.
  
  The second-pool pool requires the sum of codingChunks and dataChunks representing the number of available units of failureDomain. For failureDomain: host, second-pool requires 3 available nodes to store its objects.

Obtain the device class that meets the desired number of required replicas for the defined failureDomain.

Calculate potential data size for Ceph RGW and CephFS pools.

Calculation of data size

Obtain Ceph data stored by classes and pools:

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph df

Example output:

--- RAW STORAGE ---
CLASS     SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd     96 GiB   90 GiB  6.0 GiB   6.0 GiB       6.26
ssd     96 GiB   96 GiB  211 MiB   211 MiB       0.21
TOTAL  192 GiB  186 GiB  6.2 GiB   6.2 GiB       3.24

--- POOLS ---
POOL                                ID  PGS   STORED  OBJECTS     USED  %USED  MAX AVAIL
device_health_metrics                1    1      0 B        0      0 B      0     42 GiB
kubernetes-hdd                       2   32  2.3 GiB      707  4.6 GiB   5.15     42 GiB
kubernetes-2-ssd                    11   32     19 B        1    8 KiB      0     45 GiB
openstack-store.rgw.meta            12   32  2.5 KiB       10   64 KiB      0     45 GiB
openstack-store.rgw.log             13   32   23 KiB      309  1.3 MiB      0     45 GiB
.rgw.root                           14   32  4.8 KiB       16  120 KiB      0     45 GiB
openstack-store.rgw.otp             15   32      0 B        0      0 B      0     45 GiB
openstack-store.rgw.control         16   32      0 B        8      0 B      0     45 GiB
openstack-store.rgw.buckets.index   17   32  2.7 KiB       22  5.3 KiB      0     45 GiB
openstack-store.rgw.buckets.non-ec  18   32      0 B        0      0 B      0     45 GiB
openstack-store.rgw.buckets.data    19   32  103 MiB       26  155 MiB   0.17     61 GiB

Summarize the USED size of all <rgwName>.rgw.* pools and compare it with the AVAIL size of each applicable device class selected in the previous step.

Note

As Ceph RGW pools lack explicit specification of deviceClass, they may store objects on all device classes. The resulted device size can be smaller than the calculated USED size because part of data can already be stored in the desired class. Therefore, limiting pools to a single device class may result in a smaller occupied data size than the total USED size. Nonetheless, calculating the USED size of all pools remains valid because the pool data may not be stored on the selected device class.
For CephFS data or metadata pools, use the previous step to calculate the USED size of pools and compare it with the AVAIL size.
Decide which device class from applicable by required replicas and available size is more preferable to store Ceph RGW and CephFS data. In the example output above, hdd and ssd are both applicable. Therefore, select any of them.

Note

You can select different device classes for Ceph RGW and CephFS. For example, hdd for Ceph RGW and ssd for CephFS. Select a device class based on performance expectations, if any.

Create the rule-helper script to switch Ceph RGW or CephFS pools to a device usage.

Creation of the rule-helper script

Create the rule-helper script file:

Get a shell of the ceph-tools Pod:

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

Create the /tmp/rule-helper.py file with the following content:

cat > /tmp/rule-helper.py << EOF
import argparse
import json
import subprocess
from sys import argv, exit


def get_cmd(cmd_args):
    output_args = ['--format', 'json']
    _cmd = subprocess.Popen(cmd_args + output_args,
                            stdout=subprocess.PIPE,
                            stderr=subprocess.PIPE)
    stdout, stderr = _cmd.communicate()
    if stderr:
        error = stderr
        print("[ERROR] Failed to get '{0}': {1}".format(cmd_args.join(' '), stderr))
        return
    return stdout


def format_step(action, cmd_args):
    return "{0}:\n\t{1}".format(action, ' '.join(cmd_args))


def process_rule(rule):
    steps = []
    new_rule_name = rule['rule_name'] + '_v2'
    if rule['type'] == "replicated":
        rule_create_args = ['ceph', 'osd', 'crush', 'create-replicated',
            new_rule_name, rule['root'], rule['failure_domain'], rule['device_class']]
        steps.append(format_step("create a new replicated rule for pool", rule_create_args))
    else:
        new_profile_name = rule['profile_name'] + '_' + rule['device_class']
        profile_create_args = ['ceph', 'osd', 'erasure-code-profile', 'set', new_profile_name]
        for k,v in rule['profile'].items():
            profile_create_args.append("{0}={1}".format(k,v))
        rule_create_args = ['ceph', 'osd', 'crush', 'create-erasure', new_rule_name, new_profile_name]
        steps.append(format_step("create a new erasure-coded profile", profile_create_args))
        steps.append(format_step("create a new erasure-coded rule for pool", rule_create_args))

    set_rule_args = ['ceph', 'osd', 'pool', 'set', 'crush_rule', rule['pool_name'], new_rule_name]
    revert_rule_args = ['ceph', 'osd', 'pool', 'set', 'crush_rule', new_rule_name, rule['pool_name']]
    rm_old_rule_args = ['ceph', 'osd', 'crush', 'rule', 'rm', rule['rule_name']]
    rename_rule_args = ['ceph', 'osd', 'crush', 'rule', 'rename', new_rule_name, rule['rule_name']]
    steps.append(format_step("set pool crush rule to new one", set_rule_args))
    steps.append("check that replication is finished and status healthy: ceph -s")
    steps.append(format_step("in case of any problems revert step 2 and stop procedure", revert_rule_args))
    steps.append(format_step("remove standard (old) pool crush rule", rm_old_rule_args))
    steps.append(format_step("rename new pool crush rule to standard name", rename_rule_args))
    if rule['type'] != "replicated":
        rm_old_profile_args = ['ceph', 'osd', 'erasure-code-profile', 'rm', rule['profile_name']]
        steps.append(format_step("remove standard (old) erasure-coded profile", rm_old_profile_args))

    for idx, step in enumerate(steps):
        print("  {0}) {1}".format(idx+1, step))


def check_rules(args):
    extra_pools_lookup = []
    if args.type == "rgw":
        extra_pools_lookup.append(".rgw.root")
    pools_str = get_cmd(['ceph', 'osd', 'pool', 'ls', 'detail'])
    if pools_str == '':
        return
    rules_str = get_cmd(['ceph', 'osd', 'crush', 'rule', 'dump'])
    if rules_str == '':
        return
    try:
        pools_dump = json.loads(pools_str)
        rules_dump = json.loads(rules_str)
        if len(pools_dump) == 0:
            print("[ERROR] No pools found")
            return
        if len(rules_dump) == 0:
            print("[ERROR] No crush rules found")
            return
        crush_rules_recreate = []
        for pool in pools_dump:
            if pool['pool_name'].startswith(args.prefix) or pool['pool_name'] in extra_pools_lookup:
                rule_id = pool['crush_rule']
                for rule in rules_dump:
                    if rule['rule_id'] == rule_id:
                        recreate = False
                        new_rule = {'rule_name': rule['rule_name'], 'pool_name': pool['pool_name']}
                        for step in rule.get('steps',[]):
                            root = step.get('item_name', '').split('~')
                            if root[0] != '' and len(root) == 1:
                                new_rule['root'] = root[0]
                                continue
                            failure_domain = step.get('type', '')
                            if failure_domain != '':
                                new_rule['failure_domain'] = failure_domain
                        if new_rule.get('root', '') == '':
                            continue
                        new_rule['device_class'] = args.device_class
                        if pool['erasure_code_profile'] == "":
                            new_rule['type'] = "replicated"
                        else:
                            new_rule['type'] = "erasure"
                            profile_str = get_cmd(['ceph', 'osd', 'erasure-code-profile', 'get', pool['erasure_code_profile']])
                            if profile_str == '':
                                return
                            profile_dump = json.loads(profile_str)
                            profile_dump['crush-device-class'] = args.device_class
                            new_rule['profile_name'] = pool['erasure_code_profile']
                            new_rule['profile'] = profile_dump
                        crush_rules_recreate.append(new_rule)
                        break
        print("Found {0} pools with crush rules require device class set".format(len(crush_rules_recreate)))
        for new_rule in crush_rules_recreate:
            print("- Pool {0} requires crush rule update, device class is not set".format(new_rule['pool_name']))
            process_rule(new_rule)
    except Exception as err:
        print("[ERROR] Failed to get info from Ceph: {0}".format(err))
        return


if __name__ == '__main__':
    parser = argparse.ArgumentParser(
        description='Ceph crush rules checker. Specify device class and service name.',
        prog=argv[0], usage='%(prog)s [options]')
    parser.add_argument('--type', type=str,
                        help='Type of pool: rgw, cephfs',
                        default='',
                        required=True)
    parser.add_argument('--prefix', type=str,
                        help='Pool prefix. If objectstore - use objectstore name, if CephFS - CephFS name.',
                        default='',
                        required=True)
    parser.add_argument('--device-class', type=str,
                        help='Device class to switch on.',
                        required=True)
    args = parser.parse_args()
    if len(argv) < 3:
        parser.print_help()
        exit(0)

    check_rules(args)
EOF

Exit the ceph-tools Pod.

For Ceph RGW, execute the rule-helper script to output the step-by-step instruction and run each step provided in the output manually.

Note

The following steps include creation of crush rules with the same parameters as before but with the device class specification and switching of pools to new crush rules.

Execution of the rule-helper script steps for Ceph RGW

Get a shell of the ceph-tools Pod:

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

Run the /tmp/rule-helper.py script with the following parameters:
```
python3 /tmp/rule-helper.py --prefix <rgwName> --type rgw --device-class <deviceClass>
```
Substitute the following parameters:
- <rgwName> with the Ceph RGW name from spec.cephClusterSpec.objectStorage.rgw.name in the KaaSCephCluster object. In the example above, the name is openstack-store.
- <deviceClass> with the device class selected in the previous steps.

Using the output of the command from the previous step, run manual commands step-by-step.

Example output for the hdd device class:

Found 7 pools with crush rules require device class set
- Pool openstack-store.rgw.control requires crush rule update, device class is not set
  1) create a new replicated rule for pool:
    ceph osd crush create-replicated openstack-store.rgw.control_v2 default host hdd
  2) set pool crush rule to new one:
    ceph osd pool set crush_rule openstack-store.rgw.control openstack-store.rgw.control_v2
  3) check that replication is finished and status healthy: ceph -s
  4) in case of any problems revert step 2 and stop procedure:
    ceph osd pool set crush_rule openstack-store.rgw.control_v2 openstack-store.rgw.control
  5) remove standard (old) pool crush rule:
    ceph osd crush rule rm openstack-store.rgw.control
  6) rename new pool crush rule to standard name:
    ceph osd crush rule rename openstack-store.rgw.control_v2 openstack-store.rgw.control
- Pool openstack-store.rgw.log requires crush rule update, device class is not set
  1) create a new replicated rule for pool:
    ceph osd crush create-replicated openstack-store.rgw.log_v2 default host hdd
  2) set pool crush rule to new one:
    ceph osd pool set crush_rule openstack-store.rgw.log openstack-store.rgw.log_v2
  3) check that replication is finished and status healthy: ceph -s
  4) in case of any problems revert step 2 and stop procedure:
    ceph osd pool set crush_rule openstack-store.rgw.log_v2 openstack-store.rgw.log
  5) remove standard (old) pool crush rule:
    ceph osd crush rule rm openstack-store.rgw.log
  6) rename new pool crush rule to standard name:
    ceph osd crush rule rename openstack-store.rgw.log_v2 openstack-store.rgw.log
- Pool openstack-store.rgw.buckets.non-ec requires crush rule update, device class is not set
  1) create a new replicated rule for pool:
    ceph osd crush create-replicated openstack-store.rgw.buckets.non-ec_v2 default host hdd
  2) set pool crush rule to new one:
    ceph osd pool set crush_rule openstack-store.rgw.buckets.non-ec openstack-store.rgw.buckets.non-ec_v2
  3) check that replication is finished and status healthy: ceph -s
  4) in case of any problems revert step 2 and stop procedure:
    ceph osd pool set crush_rule openstack-store.rgw.buckets.non-ec_v2 openstack-store.rgw.buckets.non-ec
  5) remove standard (old) pool crush rule:
    ceph osd crush rule rm openstack-store.rgw.buckets.non-ec
  6) rename new pool crush rule to standard name:
    ceph osd crush rule rename openstack-store.rgw.buckets.non-ec_v2 openstack-store.rgw.buckets.non-ec
- Pool .rgw.root requires crush rule update, device class is not set
  1) create a new replicated rule for pool:
    ceph osd crush create-replicated .rgw.root_v2 default host hdd
  2) set pool crush rule to new one:
    ceph osd pool set crush_rule .rgw.root .rgw.root_v2
  3) check that replication is finished and status healthy: ceph -s
  4) in case of any problems revert step 2 and stop procedure:
    ceph osd pool set crush_rule .rgw.root_v2 .rgw.root
  5) remove standard (old) pool crush rule:
    ceph osd crush rule rm .rgw.root
  6) rename new pool crush rule to standard name:
    ceph osd crush rule rename .rgw.root_v2 .rgw.root
- Pool openstack-store.rgw.meta requires crush rule update, device class is not set
  1) create a new replicated rule for pool:
    ceph osd crush create-replicated openstack-store.rgw.meta_v2 default host hdd
  2) set pool crush rule to new one:
    ceph osd pool set crush_rule openstack-store.rgw.meta openstack-store.rgw.meta_v2
  3) check that replication is finished and status healthy: ceph -s
  4) in case of any problems revert step 2 and stop procedure:
    ceph osd pool set crush_rule openstack-store.rgw.meta_v2 openstack-store.rgw.meta
  5) remove standard (old) pool crush rule:
    ceph osd crush rule rm openstack-store.rgw.meta
  6) rename new pool crush rule to standard name:
    ceph osd crush rule rename openstack-store.rgw.meta_v2 openstack-store.rgw.meta
- Pool openstack-store.rgw.buckets.index requires crush rule update, device class is not set
  1) create a new replicated rule for pool:
    ceph osd crush create-replicated openstack-store.rgw.buckets.index_v2 default host hdd
  2) set pool crush rule to new one:
    ceph osd pool set crush_rule openstack-store.rgw.buckets.index openstack-store.rgw.buckets.index_v2
  3) check that replication is finished and status healthy: ceph -s
  4) in case of any problems revert step 2 and stop procedure:
    ceph osd pool set crush_rule openstack-store.rgw.buckets.index_v2 openstack-store.rgw.buckets.index
  5) remove standard (old) pool crush rule:
    ceph osd crush rule rm openstack-store.rgw.buckets.index
  6) rename new pool crush rule to standard name:
    ceph osd crush rule rename openstack-store.rgw.buckets.index_v2 openstack-store.rgw.buckets.index
- Pool openstack-store.rgw.buckets.data requires crush rule update, device class is not set
  1) create a new erasure-coded profile:
    ceph osd erasure-code-profile set openstack-store_ecprofile_hdd crush-device-class=hdd crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=2 m=1 plugin=jerasure technique=reed_sol_van w=8
  2) create a new erasure-coded rule for pool:
    ceph osd crush create-erasure openstack-store.rgw.buckets.data_v2 openstack-store_ecprofile_hdd
  3) set pool crush rule to new one:
    ceph osd pool set crush_rule openstack-store.rgw.buckets.data openstack-store.rgw.buckets.data_v2
  4) check that replication is finished and status healthy: ceph -s
  5) in case of any problems revert step 2 and stop procedure:
    ceph osd pool set crush_rule openstack-store.rgw.buckets.data_v2 openstack-store.rgw.buckets.data
  6) remove standard (old) pool crush rule:
    ceph osd crush rule rm openstack-store.rgw.buckets.data
  7) rename new pool crush rule to standard name:
    ceph osd crush rule rename openstack-store.rgw.buckets.data_v2 openstack-store.rgw.buckets.data
  8) remove standard (old) erasure-coded profile:
    ceph osd erasure-code-profile rm openstack-store_ecprofile

Verify that the Ceph cluster has rebalanced and has the HEALTH_OK status:
```
ceph -s
```
Exit the ceph-tools Pod.

For CephFS, execute the rule-helper script to output the step-by-step instruction and run each step provided in the output manually.

Execution of the rule-helper script steps for CephFS

Get a shell of the ceph-tools Pod:

kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash

Run the /tmp/rule-helper.py script with the following parameters:
```
python3 /tmp/rule-helper.py --prefix <cephfsName> --type cephfs --device-class <deviceClass>
```
Substitute the following parameters:
- <cephfsName> with CephFS name from spec.cephClusterSpec.sharedFilesystem.cephFS[0].name in the KaaSCephCluster object. In the example above, the name is cephfs-store.
- <deviceClass> with the device class selected in the previous steps.

Using the output of the command from the previous step, run manual commands step-by-step.

Example output for the hdd device class:

Found 3 rules require device class set
- Pool cephfs-store-metadata requires crush rule update, device class is not set
  1) create a new replicated rule for pool:
        ceph osd crush create-replicated cephfs-store-metadata_v2 default host ssd
  2) set pool crush rule to new one:
        ceph osd pool set crush_rule cephfs-store-metadata cephfs-store-metadata_v2
  3) check that replication is finished and status healthy: ceph -s
  4) in case of any problems revert step 2 and stop procedure:
        ceph osd pool set crush_rule cephfs-store-metadata_v2 cephfs-store-metadata
  5) remove standard (old) pool crush rule:
        ceph osd crush rule rm cephfs-store-metadata
  6) rename new pool crush rule to standard name:
        ceph osd crush rule rename cephfs-store-metadata_v2 cephfs-store-metadata
- Pool cephfs-store-default-pool requires crush rule update, device class is not set
  1) create a new replicated rule for pool:
        ceph osd crush create-replicated cephfs-store-default-pool_v2 default host ssd
  2) set pool crush rule to new one:
        ceph osd pool set crush_rule cephfs-store-default-pool cephfs-store-default-pool_v2
  3) check that replication is finished and status healthy: ceph -s
  4) in case of any problems revert step 2 and stop procedure:
        ceph osd pool set crush_rule cephfs-store-default-pool_v2 cephfs-store-default-pool
  5) remove standard (old) pool crush rule:
        ceph osd crush rule rm cephfs-store-default-pool
  6) rename new pool crush rule to standard name:
        ceph osd crush rule rename cephfs-store-default-pool_v2 cephfs-store-default-pool
- Pool cephfs-store-second-pool requires crush rule update, device class is not set
  1) create a new erasure-coded profile:
        ceph osd erasure-code-profile set cephfs-store-second-pool_ecprofile_ssd crush-device-class=ssd crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=2 m=1 plugin=jerasure technique=reed_sol_van w=8
  2) create a new erasure-coded rule for pool:
        ceph osd crush create-erasure cephfs-store-second-pool_v2 cephfs-store-second-pool_ecprofile_ssd
  3) set pool crush rule to new one:
        ceph osd pool set crush_rule cephfs-store-second-pool cephfs-store-second-pool_v2
  4) check that replication is finished and status healthy: ceph -s
  5) in case of any problems revert step 2 and stop procedure:
        ceph osd pool set crush_rule cephfs-store-second-pool_v2 cephfs-store-second-pool
  6) remove standard (old) pool crush rule:
        ceph osd crush rule rm cephfs-store-second-pool
  7) rename new pool crush rule to standard name:
        ceph osd crush rule rename cephfs-store-second-pool_v2 cephfs-store-second-pool
  8) remove standard (old) erasure-coded profile:
        ceph osd erasure-code-profile rm cephfs-store-second-pool_ecprofile

Verify that the Ceph cluster has rebalanced and has the HEALTH_OK status:
```
ceph -s
```
Exit the ceph-tools Pod.

Verify the pg_autoscaler module after switching deviceClass for all required pools:
```
ceph osd pool autoscale-status
```
The system response must contain all Ceph RGW and CephFS pools.

On the management cluster, edit the KaaSCephCluster object of the corresponding managed cluster by adding the selected device class to the deviceClass parameter of the updated Ceph RGW and CephFS pools:

kubectl -n <managedClusterProjectName> edit kaascephcluster

You can use this configuration step for further management of Ceph RGW and/or CephFS. It does not impact the existing Ceph cluster configuration.

[26441] Cluster update fails with the MountDevice failed for volume warning¶

Update of a managed cluster based on bare metal and Ceph enabled fails with PersistentVolumeClaim getting stuck in the Pending state for the prometheus-server StatefulSet and the MountVolume.MountDevice failed for volume warning in the StackLight event logs.

Workaround:

Verify that the description of the Pods that failed to run contain the FailedMount events:
```
kubectl -n <affectedProjectName> describe pod <affectedPodName>
```
In the command above, replace the following values:
- <affectedProjectName> is the Container Cloud project name where the Pods failed to run
- <affectedPodName> is a Pod name that failed to run in the specified project
In the Pod description, identify the node name where the Pod failed to run.
Verify that the csi-rbdplugin logs of the affected node contain the rbd volume mount failed: <csi-vol-uuid> is being used error. The <csi-vol-uuid> is a unique RBD volume name.
1. Identify csiPodName of the corresponding csi-rbdplugin:
```
kubectl -n rook-ceph get pod -l app=csi-rbdplugin \
-o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
```
2. Output the affected csiPodName logs:
```
kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
```
Scale down the affected StatefulSet or Deployment of the Pod that fails to 0 replicas.

On every csi-rbdplugin Pod, search for stuck csi-vol:

for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do
  echo $pod
  kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid>
done

Unmap the affected csi-vol:
```
rbd unmap -o force /dev/rbd<i>
```
The /dev/rbd<i> value is a mapped RBD volume that uses csi-vol.

Delete volumeattachment of the affected Pod:

kubectl get volumeattachments | grep <csi-vol-uuid>
kubectl delete volumeattacmhent <id>

Scale up the affected StatefulSet or Deployment back to the original number of replicas and wait until its state becomes Running.

StackLight¶

[31485] Elasticsearch Curator does not delete indices as per retention period¶

Fixed in 14.0.0(1) and 15.0.1

Note

If you obtain patch releases, the issue is addressed in 2.23.2 for management and regional clusters and in 11.7.1 and 12.7.1 for managed clusters.

Elasticsearch Curator does not delete any indices according to the configured retention period on any type of Container Cloud clusters.

To verify whether your cluster is affected:

Identify versions of Cluster releases installed on your clusters:

kubectl get cluster --all-namespaces \
-o custom-columns=CLUSTER:.metadata.name,NAMESPACE:.metadata.namespace,VERSION:.spec.providerSpec.value.release

The following list contains all affected Cluster releases:

mke-11-7-0-3-5-7
mke-13-4-4
mke-13-5-3
mke-13-6-0
mke-13-7-0
mosk-12-7-0-23-1

As a workaround, on the affected clusters, create a temporary CronJob for elasticsearch-curator to clean the required indices:

kubectl get cronjob elasticsearch-curator -n stacklight -o json \
| sed 's/5.7.6-[0-9]*/5.7.6-20230404082402/g' \
| jq '.spec.schedule = "30 * * * *"' \
| jq '.metadata.name = "temporary-elasticsearch-curator"' \
| jq 'del(.metadata.resourceVersion,.metadata.uid,.metadata.selfLink,.metadata.creationTimestamp,.metadata.annotations,.metadata.generation,.metadata.ownerReferences,.metadata.labels,.spec.jobTemplate.metadata.labels,.spec.jobTemplate.spec.template.metadata.creationTimestamp,.spec.jobTemplate.spec.template.metadata.labels)' \
| jq '.metadata.labels.app = "temporary-elasticsearch-curator"' \
| jq '.spec.jobTemplate.metadata.labels.app = "temporary-elasticsearch-curator"' \
| jq '.spec.jobTemplate.spec.template.metadata.labels.app = "temporary-elasticsearch-curator"' \
| kubectl create -f -

Note

This CronJob is removed automatically during upgrade to the major Container Cloud release 2.24.0 or to the patch Container Cloud release 2.23.3 if you obtain patch releases.