Known issues¶
This section lists known issues with workarounds for the Mirantis Container Cloud release 2.23.0 including the Cluster release 11.7.0.
For other issues that can occur while deploying and operating a Container Cloud cluster, see Deployment Guide: Troubleshooting and Operations Guide: Troubleshooting.
Note
This section also outlines still valid known issues from previous Container Cloud releases.
Bare metal¶
[29762] Wrong IP address is assigned after the MetalLB controller restart¶
Due to the upstream MetalLB issue,
a race condition occurs when assigning an IP address after the MetalLB
controller restart. If a new service of the LoadBalancer
type is created
during the MetalLB Controller restart, then this service can be assigned an IP
address that was already assigned to another service before the MetalLB
Controller restart.
To verify that the cluster is affected:
Verify whether IP addresses of the LoadBalancer
(LB) type are duplicated
where they are not supposed to:
kubectl get svc -A|grep LoadBalancer
Note
Some services use shared IP addresses on purpose. In the example system response below, these are services using the IP address 10.0.1.141.
Example system response:
kaas dhcp-lb LoadBalancer 10.233.4.192 10.0.1.141 53:32594/UDP,67:30048/UDP,68:30464/UDP,69:31898/UDP,123:32450/UDP 13h
kaas dhcp-lb-tcp LoadBalancer 10.233.6.79 10.0.1.141 8080:31796/TCP,53:32012/TCP 11h
kaas httpd-http LoadBalancer 10.233.0.92 10.0.1.141 80:30115/TCP 13h
kaas iam-keycloak-http LoadBalancer 10.233.55.2 10.100.91.101 443:30858/TCP,9990:32301/TCP 2h
kaas ironic-kaas-bm LoadBalancer 10.233.26.176 10.0.1.141 6385:31748/TCP,8089:30604/TCP,5050:32200/TCP,9797:31988/TCP,601:31888/TCP 13h
kaas ironic-syslog LoadBalancer 10.233.59.199 10.0.1.141 514:32098/UDP 13h
kaas kaas-kaas-ui LoadBalancer 10.233.51.167 10.100.91.101 443:30976/TCP 13h
kaas mcc-cache LoadBalancer 10.233.40.68 10.100.91.102 80:32278/TCP,443:32462/TCP 12h
kaas mcc-cache-pxe LoadBalancer 10.233.10.75 10.0.1.142 80:30112/TCP,443:31559/TCP 12h
stacklight iam-proxy-alerta LoadBalancer 10.233.4.102 10.100.91.104 443:30101/TCP 12h
stacklight iam-proxy-alertmanager LoadBalancer 10.233.46.45 10.100.91.105 443:30944/TCP 12h
stacklight iam-proxy-grafana LoadBalancer 10.233.39.24 10.100.91.106 443:30953/TCP 12h
stacklight iam-proxy-prometheus LoadBalancer 10.233.12.174 10.100.91.107 443:31300/TCP 12h
stacklight telemeter-server-external LoadBalancer 10.233.56.63 10.100.91.103 443:30582/TCP 12h
In the above example, the iam-keycloak-http
and kaas-kaas-ui
services erroneously use the same IP address 10.100.91.101. They both use the
same port 443 producing a collision when an application tries to access the
10.100.91.101:443 endpoint.
Workaround:
Unassign the current LB IP address for the selected service, as no LB IP address can be used for the
NodePort
service:kubectl -n kaas patch svc <serviceName> -p '{"spec":{"type":"NodePort"}}'
Assign a new LB IP address for the selected service:
kubectl -n kaas patch svc <serviceName> -p '{"spec":{"type":"LoadBalancer"}}'
The second affected service will continue using its current LB IP address.
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state¶
During deletion of a manager machine running the ironic
Pod from a bare
metal management cluster, the following problems occur:
All Pods are stuck in the
Terminating
stateA new
ironic
Pod fails to startThe related bare metal host is stuck in the
deprovisioning
state
As a workaround, before deletion of the node running the ironic
Pod,
cordon and drain the node using the kubectl cordon <nodeName> and
kubectl drain <nodeName> commands.
[20736] Region deletion failure after regional deployment failure¶
If a baremetal-based regional cluster deployment fails before pivoting is done, the corresponding region deletion fails.
Workaround:
Using the command below, manually delete all possible traces of the failed
regional cluster deployment, including but not limited to the following
objects that contain the kaas.mirantis.com/region
label of the affected
region:
cluster
machine
baremetalhost
baremetalhostprofile
l2template
subnet
ipamhost
ipaddr
kubectl delete <objectName> -l kaas.mirantis.com/region=<regionName>
Warning
Do not use the same region name again after the regional cluster deployment failure since some objects that reference the region name may still exist.
LCM¶
[5981] Upgrade gets stuck on the cluster with more that 120 nodes¶
Upgrade of a cluster with more than 120 nodes gets stuck with errors about
IP addresses exhaustion in the docker
logs.
Note
If you plan to scale your cluster to more than 120 nodes, the cluster will be affected by the issue. Therefore, you will have to perform the workaround below.
Workaround:
Caution
If you have not run the cluster upgrade yet, simply recreate the
mke-overlay
network as described in the step 6 and skip all other steps.
Note
If you successfully upgraded the cluster with less than 120 nodes but plan to scale it to more that 120 node, proceed with steps 2-9.
Verify that MKE nodes are upgraded:
On any master node, run the following command to identify
ucp-worker-agent
that has a newer version:docker service ls
Example of system response:
ID NAME MODE REPLICAS IMAGE PORTS 7jdl9m0giuso ucp-3-5-7 global 0/0 mirantis/ucp:3.5.7 uloi2ixrd0br ucp-auth-api global 3/3 mirantis/ucp-auth:3.5.7 pfub4xa17nkb ucp-auth-worker global 3/3 mirantis/ucp-auth:3.5.7 00w1kqn0x69w ucp-cluster-agent replicated 1/1 mirantis/ucp-agent:3.5.7 xjhwv1vrw9k5 ucp-kube-proxy-win global 0/0 mirantis/ucp-agent-win:3.5.7 oz28q8a7swmo ucp-kubelet-win global 0/0 mirantis/ucp-agent-win:3.5.7 ssjwonmnvk3s ucp-manager-agent global 3/3 mirantis/ucp-agent:3.5.7 ks0ttzydkxmh ucp-pod-cleaner-win global 0/0 mirantis/ucp-agent-win:3.5.7 w5d25qgneibv ucp-tigera-felix-win global 0/0 mirantis/ucp-agent-win:3.5.7 ni86z33o10n3 ucp-tigera-node-win global 0/0 mirantis/ucp-agent-win:3.5.7 iyyh1f0z6ejc ucp-worker-agent-win-x global 0/0 mirantis/ucp-agent-win:3.5.5 5z6ew4fmf2mm ucp-worker-agent-win-y global 0/0 mirantis/ucp-agent-win:3.5.7 gr52h05hcwwn ucp-worker-agent-x global 56/56 mirantis/ucp-agent:3.5.5 e8coi9bx2j7j ucp-worker-agent-y global 121/121 mirantis/ucp-agent:3.5.7
In the above example, it is
ucp-worker-agent-y
.Obtain the node list:
docker service ps ucp-worker-agent-y | awk -F ' ' ‘$4 ~ /^kaas/ {print $4}’ > upgraded_nodes.txt
Identify the cluster ID. For example, run the following command on the management cluster:
kubectl -n <clusterNamespace> get cluster <clusterName> -o json | jq '.status.providerStatus.mke.clusterID'
Create a backup of MKE as described in the MKE documentation: Backup procedure.
Remove MKE services:
docker service rm ucp-cluster-agent ucp-manager-agent ucp-worker-agent-win-y ucp-worker-agent-y ucp-worker-agent-win-x ucp-worker-agent-x
Remove the
mke-overlay
network:docker network rm mke-overlay
Recreate the
mke-overlay
network with a correct CIDR that must be at least/20
and have no interventions with other subnets in the cluster network. For example:docker network create -d overlay --subnet 10.1.0.0/20 mke-overlay
Create placeholder worker services:
docker service create --name ucp-worker-agent-x --mode global --constraint node.labels.foo==bar --detach busybox sleep 3d docker service create --name ucp-worker-agent-win-x --mode global --constraint node.labels.foo==bar --detach busybox sleep 3d
Recreate all MKE services using the previously obtained cluster ID. Use the target version for your cluster, for example,
3.5.7
:docker container run --rm -it --name ucp -v /var/run/docker.sock:/var/run/docker.sock mirantis/ucp:3.5.7 upgrade --debug --manual-worker-upgrade --force-minimums --id <cluster ID> --interactive --force-port-check
Note
Because of interactive mode, you may need to use
Ctrl+C
when the command execution completes.Verify that all services are recreated:
docker service ls
The exemplary
ucp-worker-agent-y
service must have 1 replica running with a node that was previously stuck.Using the node list obtained in the first step, remove the
upgrade-hold
labels from the nodes that were previously upgraded:for i in $(cat upgraded_nodes.txt); do docker node update --label-rm com.docker.ucp.upgrade-hold $i; done
Verify that all nodes from the list obtained in the first step are present in the
ucp-worker-agent-y
service. For example:docker service ps ucp-worker-agent-y
[5782] Manager machine fails to be deployed during node replacement¶
During replacement of a manager machine, the following problems may occur:
The system adds the node to Docker swarm but not to Kubernetes
The node
Deployment
gets stuck with failed RethinkDB health checks
Workaround:
Delete the failed node.
Wait for the MKE cluster to become healthy. To monitor the cluster status:
Log in to the MKE web UI as described in Connect to the Mirantis Kubernetes Engine web UI.
Monitor the cluster status as described in MKE Operations Guide: Monitor an MKE cluster with the MKE web UI.
Deploy a new node.
[5568] The ‘calico-kube-controllers’ Pod fails to clean up resources¶
During the unsafe
or forced
deletion of a manager machine running the
calico-kube-controllers
Pod in the kube-system
namespace,
the following issues occur:
The
calico-kube-controllers
Pod fails to clean up resources associated with the deleted nodeThe
calico-node
Pod may fail to start up on a newly created node if the machine is provisioned with the same IP address as the deleted machine had
As a workaround, before deletion of the node running the
calico-kube-controllers
Pod, cordon and drain the node:
kubectl cordon <nodeName>
kubectl drain <nodeName>
[30294] Replacement of a ‘master’ node is stuck on the ‘calico-node’ Pod start¶
During replacement of a master
node on a cluster of any type, the
calico-node
Pod fails to start on a new node that has the same IP address
as the node being replaced.
Workaround:
Log in to any
master
node.From a CLI with an MKE client bundle, create a shell alias to start calicoctl using the
mirantis/ucp-dsinfo
image:alias calicoctl="\ docker run -i --rm \ --pid host \ --net host \ -e constraint:ostype==linux \ -e ETCD_ENDPOINTS=<etcdEndpoint> \ -e ETCD_KEY_FILE=/ucp-node-certs/key.pem \ -e ETCD_CA_CERT_FILE=/ucp-node-certs/ca.pem \ -e ETCD_CERT_FILE=/ucp-node-certs/cert.pem \ -v /var/run/calico:/var/run/calico \ -v ucp-node-certs:/ucp-node-certs:ro \ mirantis/ucp-dsinfo:<mkeVersion> \ calicoctl --allow-version-mismatch \ "
In the above command, replace the following values with the corresponding settings of the affected cluster:
<etcdEndpoint>
is the etcd endpoint defined in the Calico configuration file. For example,ETCD_ENDPOINTS=127.0.0.1:12378
<mkeVersion>
is the MKE version installed on your cluster. For example,mirantis/ucp-dsinfo:3.5.7
.
Verify the node list on the cluster:
kubectl get node
Compare this list with the node list in Calico to identify the old node:
calicoctl get node -o wide
Remove the old node from Calico:
calicoctl delete node kaas-node-<nodeID>
[27797] A cluster ‘kubeconfig’ stops working during MKE minor version update¶
During update of a Container Cloud cluster of any type, if the MKE minor
version is updated from 3.4.x to 3.5.x, access to the cluster using the
existing kubeconfig
fails with the You must be logged in to the server
(Unauthorized) error due to OIDC settings being reconfigured.
As a workaround, during the cluster update process, use the admin
kubeconfig
instead of the existing one. Once the update completes, you can
use the existing cluster kubeconfig
again.
To obtain the admin
kubeconfig
:
kubectl --kubeconfig <pathToMgmtKubeconfig> get secret -n <affectedClusterNamespace> \
-o yaml <affectedClusterName>-kubeconfig | awk '/admin.conf/ {print $2}' | \
head -1 | base64 -d > clusterKubeconfig.yaml
If the related cluster is regional, replace <pathToMgmtKubeconfig>
with
<pathToRegionalKubeconfig>
.
TLS configuration¶
[29604] The ‘failed to get kubeconfig’ error during TLS configuration¶
When setting a new Transport Layer Security (TLS) certificate for a cluster,
the false positive failed to get kubeconfig
error may occur on the
Waiting for TLS settings to be applied
stage. No actions are required.
Therefore, disregard the error.
To verify the status of the TLS configuration being applied:
kubectl get cluster <ClusterName> -n <ClusterProjectName> -o jsonpath-as-json="{.status.providerStatus.tls.<Application>}"
Possible values for the <Application>
parameter are as follows:
keycloak
ui
cache
mke
iamProxyAlerta
iamProxyAlertManager
iamProxyGrafana
iamProxyKibana
iamProxyPrometheus
Example of system response:
[
{
"expirationTime": "2024-01-06T09:37:04Z",
"hostname": "domain.com",
}
]
In this example, expirationTime
equals the NotAfter
field of the
server certificate. And the value of hostname
contains the configured
application name.
Ceph¶
[30857] Irrelevant error during Ceph OSD deployment on removable devices¶
The deployment of Ceph OSDs fails with the following messages in the status
section of the KaaSCephCluster
custom resource:
shortClusterInfo:
messages:
- Not all osds are deployed
- Not all osds are in
- Not all osds are up
To find out if your cluster is affected, verify if the devices on
the AMD hosts you use for the Ceph OSDs deployment are removable.
For example, if the sdb
device name is specified in
spec.cephClusterSpec.nodes.storageDevices
of the KaaSCephCluster
custom resource for the affected host, run:
# cat /sys/block/sdb/removable
1
The system output shows that the reason of the above messages in status
is the enabled hotplug functionality on the AMD nodes, which marks all drives
as removable. And the hotplug functionality is not supported by Ceph in
Container Cloud.
As a workaround, disable the hotplug functionality in the BIOS settings for disks that are configured to be used as Ceph OSD data devices.
[30635] Ceph ‘pg_autoscaler’ is stuck with the ‘overlapping roots’ error¶
Due to the upstream Ceph issue
occurring since Ceph Pacific, the pg_autoscaler
module of Ceph Manager
fails with the pool <poolNumber> has overlapping roots error if a Ceph
cluster contains a mix of pools with deviceClass
either explicitly
specified or not specified.
The deviceClass
parameter is required for a pool definition in the
spec
section of the KaaSCephCluster
object, but not required for Ceph
RADOS Gateway (RGW) and Ceph File System (CephFS).
Therefore, if sections for Ceph RGW or CephFS data or metadata pools are
defined without deviceClass
, then autoscaling of placement groups is
disabled on a cluster due to overlapping roots. Overlapping roots imply that
Ceph RGW and/or CephFS pools obtained the default crush rule and have no
demarcation on a specific class to store data.
Note
If pools for Ceph RGW and CephFS already have deviceClass
specified, skip the corresponding steps of the below procedure.
Note
Perform the below procedure on the affected managed cluster using
its kubeconfig
.
Workaround:
Obtain
failureDomain
and required replicas for Ceph RGW and/or CephFS pools:Note
If the
KaasCephCluster
spec
section does not containfailureDomain
,failureDomain
equalshost
by default to store one replica per node.Note
The types of pools crush rules include:
An
erasureCoded
pool requires thecodingChunks + dataChunks
number of available units offailureDomain
.A
replicated
pool requires thereplicated.size
number of available units offailureDomain
.
To obtain Ceph RGW pools, use the
spec.cephClusterSpec.objectStorage.rgw
section of theKaaSCephCluster
object. For example:objectStorage: rgw: dataPool: failureDomain: host erasureCoded: codingChunks: 1 dataChunks: 2 metadataPool: failureDomain: host replicated: size: 3 gateway: allNodes: false instances: 3 port: 80 securePort: 8443 name: openstack-store preservePoolsOnDelete: false
The
dataPool
pool requires the sum ofcodingChunks
anddataChunks
values representing the number of available units offailureDomain
. In the example above, forfailureDomain: host
,dataPool
requires3
available nodes to store its objects.The
metadataPool
pool requires thereplicated.size
number of available units offailureDomain
. ForfailureDomain: host
,metadataPool
requires3
available nodes to store its objects.To obtain CephFS pools, use the
spec.cephClusterSpec.sharedFilesystem.cephFS
section of theKaaSCephCluster
object. For example:sharedFilesystem: cephFS: - name: cephfs-store dataPools: - name: default-pool replicated: size: 3 failureDomain: host - name: second-pool erasureCoded: dataChunks: 2 codingChunks: 1 metadataPool: replicated: size: 3 failureDomain: host ...
The
default-pool
andmetadataPool
pools require thereplicated.size
number of available units offailureDomain
. ForfailureDomain: host
,default-pool
requires3
available nodes to store its objects.The
second-pool
pool requires the sum ofcodingChunks
anddataChunks
representing the number of available units offailureDomain
. ForfailureDomain: host
,second-pool
requires3
available nodes to store its objects.
Obtain the device class that meets the desired number of required replicas for the defined
failureDomain
.Obtaining of the device class
Get a shell of the
ceph-tools
Pod:kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
Obtain the Ceph crush tree with all available crush rules of the device class:
ceph osd tree
Example output:
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 0.18713 root default -3 0.06238 host kaas-node-a29ecf2d-a2cc-493e-bd83-00e9639a7db8 0 hdd 0.03119 osd.0 up 1.00000 1.00000 3 ssd 0.03119 osd.3 up 1.00000 1.00000 -5 0.06238 host kaas-node-dd6826b0-fe3f-407c-ae29-6b0e4a40019d 1 hdd 0.03119 osd.1 up 1.00000 1.00000 4 ssd 0.03119 osd.4 up 1.00000 1.00000 -7 0.06238 host kaas-node-df65fa30-d657-477e-bad2-16f69596d37a 2 hdd 0.03119 osd.2 up 1.00000 1.00000 5 ssd 0.03119 osd.5 up 1.00000 1.00000
Calculate the number of the
failureDomain
units with each device class.For
failureDomain: host
,hdd
andssd
device classes from the example output above have3
units each.Select the device classes that meet the replicas requirement. In the example output above, both
hdd
andssd
are applicable to store the pool data.Exit the
ceph-tools
Pod.
Calculate potential data size for Ceph RGW and CephFS pools.
Calculation of data size
Obtain Ceph data stored by classes and pools:
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph df
Example output:
--- RAW STORAGE --- CLASS SIZE AVAIL USED RAW USED %RAW USED hdd 96 GiB 90 GiB 6.0 GiB 6.0 GiB 6.26 ssd 96 GiB 96 GiB 211 MiB 211 MiB 0.21 TOTAL 192 GiB 186 GiB 6.2 GiB 6.2 GiB 3.24 --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL device_health_metrics 1 1 0 B 0 0 B 0 42 GiB kubernetes-hdd 2 32 2.3 GiB 707 4.6 GiB 5.15 42 GiB kubernetes-2-ssd 11 32 19 B 1 8 KiB 0 45 GiB openstack-store.rgw.meta 12 32 2.5 KiB 10 64 KiB 0 45 GiB openstack-store.rgw.log 13 32 23 KiB 309 1.3 MiB 0 45 GiB .rgw.root 14 32 4.8 KiB 16 120 KiB 0 45 GiB openstack-store.rgw.otp 15 32 0 B 0 0 B 0 45 GiB openstack-store.rgw.control 16 32 0 B 8 0 B 0 45 GiB openstack-store.rgw.buckets.index 17 32 2.7 KiB 22 5.3 KiB 0 45 GiB openstack-store.rgw.buckets.non-ec 18 32 0 B 0 0 B 0 45 GiB openstack-store.rgw.buckets.data 19 32 103 MiB 26 155 MiB 0.17 61 GiB
Summarize the
USED
size of all<rgwName>.rgw.*
pools and compare it with theAVAIL
size of each applicable device class selected in the previous step.Note
As Ceph RGW pools lack explicit specification of
deviceClass
, they may store objects on all device classes. The resulted device size can be smaller than the calculatedUSED
size because part of data can already be stored in the desired class. Therefore, limiting pools to a single device class may result in a smaller occupied data size than the totalUSED
size. Nonetheless, calculating theUSED
size of all pools remains valid because the pool data may not be stored on the selected device class.For CephFS data or metadata pools, use the previous step to calculate the
USED
size of pools and compare it with theAVAIL
size.Decide which device class from applicable by required replicas and available size is more preferable to store Ceph RGW and CephFS data. In the example output above,
hdd
andssd
are both applicable. Therefore, select any of them.Note
You can select different device classes for Ceph RGW and CephFS. For example,
hdd
for Ceph RGW andssd
for CephFS. Select a device class based on performance expectations, if any.
Create the
rule-helper
script to switch Ceph RGW or CephFS pools to a device usage.Creation of the
rule-helper
scriptCreate the
rule-helper
script file:Get a shell of the
ceph-tools
Pod:kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
Create the
/tmp/rule-helper.py
file with the following content:cat > /tmp/rule-helper.py << EOF import argparse import json import subprocess from sys import argv, exit def get_cmd(cmd_args): output_args = ['--format', 'json'] _cmd = subprocess.Popen(cmd_args + output_args, stdout=subprocess.PIPE, stderr=subprocess.PIPE) stdout, stderr = _cmd.communicate() if stderr: error = stderr print("[ERROR] Failed to get '{0}': {1}".format(cmd_args.join(' '), stderr)) return return stdout def format_step(action, cmd_args): return "{0}:\n\t{1}".format(action, ' '.join(cmd_args)) def process_rule(rule): steps = [] new_rule_name = rule['rule_name'] + '_v2' if rule['type'] == "replicated": rule_create_args = ['ceph', 'osd', 'crush', 'create-replicated', new_rule_name, rule['root'], rule['failure_domain'], rule['device_class']] steps.append(format_step("create a new replicated rule for pool", rule_create_args)) else: new_profile_name = rule['profile_name'] + '_' + rule['device_class'] profile_create_args = ['ceph', 'osd', 'erasure-code-profile', 'set', new_profile_name] for k,v in rule['profile'].items(): profile_create_args.append("{0}={1}".format(k,v)) rule_create_args = ['ceph', 'osd', 'crush', 'create-erasure', new_rule_name, new_profile_name] steps.append(format_step("create a new erasure-coded profile", profile_create_args)) steps.append(format_step("create a new erasure-coded rule for pool", rule_create_args)) set_rule_args = ['ceph', 'osd', 'pool', 'set', 'crush_rule', rule['pool_name'], new_rule_name] revert_rule_args = ['ceph', 'osd', 'pool', 'set', 'crush_rule', new_rule_name, rule['pool_name']] rm_old_rule_args = ['ceph', 'osd', 'crush', 'rule', 'rm', rule['rule_name']] rename_rule_args = ['ceph', 'osd', 'crush', 'rule', 'rename', new_rule_name, rule['rule_name']] steps.append(format_step("set pool crush rule to new one", set_rule_args)) steps.append("check that replication is finished and status healthy: ceph -s") steps.append(format_step("in case of any problems revert step 2 and stop procedure", revert_rule_args)) steps.append(format_step("remove standard (old) pool crush rule", rm_old_rule_args)) steps.append(format_step("rename new pool crush rule to standard name", rename_rule_args)) if rule['type'] != "replicated": rm_old_profile_args = ['ceph', 'osd', 'erasure-code-profile', 'rm', rule['profile_name']] steps.append(format_step("remove standard (old) erasure-coded profile", rm_old_profile_args)) for idx, step in enumerate(steps): print(" {0}) {1}".format(idx+1, step)) def check_rules(args): extra_pools_lookup = [] if args.type == "rgw": extra_pools_lookup.append(".rgw.root") pools_str = get_cmd(['ceph', 'osd', 'pool', 'ls', 'detail']) if pools_str == '': return rules_str = get_cmd(['ceph', 'osd', 'crush', 'rule', 'dump']) if rules_str == '': return try: pools_dump = json.loads(pools_str) rules_dump = json.loads(rules_str) if len(pools_dump) == 0: print("[ERROR] No pools found") return if len(rules_dump) == 0: print("[ERROR] No crush rules found") return crush_rules_recreate = [] for pool in pools_dump: if pool['pool_name'].startswith(args.prefix) or pool['pool_name'] in extra_pools_lookup: rule_id = pool['crush_rule'] for rule in rules_dump: if rule['rule_id'] == rule_id: recreate = False new_rule = {'rule_name': rule['rule_name'], 'pool_name': pool['pool_name']} for step in rule.get('steps',[]): root = step.get('item_name', '').split('~') if root[0] != '' and len(root) == 1: new_rule['root'] = root[0] continue failure_domain = step.get('type', '') if failure_domain != '': new_rule['failure_domain'] = failure_domain if new_rule.get('root', '') == '': continue new_rule['device_class'] = args.device_class if pool['erasure_code_profile'] == "": new_rule['type'] = "replicated" else: new_rule['type'] = "erasure" profile_str = get_cmd(['ceph', 'osd', 'erasure-code-profile', 'get', pool['erasure_code_profile']]) if profile_str == '': return profile_dump = json.loads(profile_str) profile_dump['crush-device-class'] = args.device_class new_rule['profile_name'] = pool['erasure_code_profile'] new_rule['profile'] = profile_dump crush_rules_recreate.append(new_rule) break print("Found {0} pools with crush rules require device class set".format(len(crush_rules_recreate))) for new_rule in crush_rules_recreate: print("- Pool {0} requires crush rule update, device class is not set".format(new_rule['pool_name'])) process_rule(new_rule) except Exception as err: print("[ERROR] Failed to get info from Ceph: {0}".format(err)) return if __name__ == '__main__': parser = argparse.ArgumentParser( description='Ceph crush rules checker. Specify device class and service name.', prog=argv[0], usage='%(prog)s [options]') parser.add_argument('--type', type=str, help='Type of pool: rgw, cephfs', default='', required=True) parser.add_argument('--prefix', type=str, help='Pool prefix. If objectstore - use objectstore name, if CephFS - CephFS name.', default='', required=True) parser.add_argument('--device-class', type=str, help='Device class to switch on.', required=True) args = parser.parse_args() if len(argv) < 3: parser.print_help() exit(0) check_rules(args) EOF
Exit the
ceph-tools
Pod.
For Ceph RGW, execute the
rule-helper
script to output the step-by-step instruction and run each step provided in the output manually.Note
The following steps include creation of crush rules with the same parameters as before but with the device class specification and switching of pools to new crush rules.
Execution of the
rule-helper
script steps for Ceph RGWGet a shell of the
ceph-tools
Pod:kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
Run the
/tmp/rule-helper.py
script with the following parameters:python3 /tmp/rule-helper.py --prefix <rgwName> --type rgw --device-class <deviceClass>
Substitute the following parameters:
<rgwName>
with the Ceph RGW name fromspec.cephClusterSpec.objectStorage.rgw.name
in theKaaSCephCluster
object. In the example above, the name isopenstack-store
.<deviceClass>
with the device class selected in the previous steps.
Using the output of the command from the previous step, run manual commands step-by-step.
Example output for the
hdd
device class:Found 7 pools with crush rules require device class set - Pool openstack-store.rgw.control requires crush rule update, device class is not set 1) create a new replicated rule for pool: ceph osd crush create-replicated openstack-store.rgw.control_v2 default host hdd 2) set pool crush rule to new one: ceph osd pool set crush_rule openstack-store.rgw.control openstack-store.rgw.control_v2 3) check that replication is finished and status healthy: ceph -s 4) in case of any problems revert step 2 and stop procedure: ceph osd pool set crush_rule openstack-store.rgw.control_v2 openstack-store.rgw.control 5) remove standard (old) pool crush rule: ceph osd crush rule rm openstack-store.rgw.control 6) rename new pool crush rule to standard name: ceph osd crush rule rename openstack-store.rgw.control_v2 openstack-store.rgw.control - Pool openstack-store.rgw.log requires crush rule update, device class is not set 1) create a new replicated rule for pool: ceph osd crush create-replicated openstack-store.rgw.log_v2 default host hdd 2) set pool crush rule to new one: ceph osd pool set crush_rule openstack-store.rgw.log openstack-store.rgw.log_v2 3) check that replication is finished and status healthy: ceph -s 4) in case of any problems revert step 2 and stop procedure: ceph osd pool set crush_rule openstack-store.rgw.log_v2 openstack-store.rgw.log 5) remove standard (old) pool crush rule: ceph osd crush rule rm openstack-store.rgw.log 6) rename new pool crush rule to standard name: ceph osd crush rule rename openstack-store.rgw.log_v2 openstack-store.rgw.log - Pool openstack-store.rgw.buckets.non-ec requires crush rule update, device class is not set 1) create a new replicated rule for pool: ceph osd crush create-replicated openstack-store.rgw.buckets.non-ec_v2 default host hdd 2) set pool crush rule to new one: ceph osd pool set crush_rule openstack-store.rgw.buckets.non-ec openstack-store.rgw.buckets.non-ec_v2 3) check that replication is finished and status healthy: ceph -s 4) in case of any problems revert step 2 and stop procedure: ceph osd pool set crush_rule openstack-store.rgw.buckets.non-ec_v2 openstack-store.rgw.buckets.non-ec 5) remove standard (old) pool crush rule: ceph osd crush rule rm openstack-store.rgw.buckets.non-ec 6) rename new pool crush rule to standard name: ceph osd crush rule rename openstack-store.rgw.buckets.non-ec_v2 openstack-store.rgw.buckets.non-ec - Pool .rgw.root requires crush rule update, device class is not set 1) create a new replicated rule for pool: ceph osd crush create-replicated .rgw.root_v2 default host hdd 2) set pool crush rule to new one: ceph osd pool set crush_rule .rgw.root .rgw.root_v2 3) check that replication is finished and status healthy: ceph -s 4) in case of any problems revert step 2 and stop procedure: ceph osd pool set crush_rule .rgw.root_v2 .rgw.root 5) remove standard (old) pool crush rule: ceph osd crush rule rm .rgw.root 6) rename new pool crush rule to standard name: ceph osd crush rule rename .rgw.root_v2 .rgw.root - Pool openstack-store.rgw.meta requires crush rule update, device class is not set 1) create a new replicated rule for pool: ceph osd crush create-replicated openstack-store.rgw.meta_v2 default host hdd 2) set pool crush rule to new one: ceph osd pool set crush_rule openstack-store.rgw.meta openstack-store.rgw.meta_v2 3) check that replication is finished and status healthy: ceph -s 4) in case of any problems revert step 2 and stop procedure: ceph osd pool set crush_rule openstack-store.rgw.meta_v2 openstack-store.rgw.meta 5) remove standard (old) pool crush rule: ceph osd crush rule rm openstack-store.rgw.meta 6) rename new pool crush rule to standard name: ceph osd crush rule rename openstack-store.rgw.meta_v2 openstack-store.rgw.meta - Pool openstack-store.rgw.buckets.index requires crush rule update, device class is not set 1) create a new replicated rule for pool: ceph osd crush create-replicated openstack-store.rgw.buckets.index_v2 default host hdd 2) set pool crush rule to new one: ceph osd pool set crush_rule openstack-store.rgw.buckets.index openstack-store.rgw.buckets.index_v2 3) check that replication is finished and status healthy: ceph -s 4) in case of any problems revert step 2 and stop procedure: ceph osd pool set crush_rule openstack-store.rgw.buckets.index_v2 openstack-store.rgw.buckets.index 5) remove standard (old) pool crush rule: ceph osd crush rule rm openstack-store.rgw.buckets.index 6) rename new pool crush rule to standard name: ceph osd crush rule rename openstack-store.rgw.buckets.index_v2 openstack-store.rgw.buckets.index - Pool openstack-store.rgw.buckets.data requires crush rule update, device class is not set 1) create a new erasure-coded profile: ceph osd erasure-code-profile set openstack-store_ecprofile_hdd crush-device-class=hdd crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=2 m=1 plugin=jerasure technique=reed_sol_van w=8 2) create a new erasure-coded rule for pool: ceph osd crush create-erasure openstack-store.rgw.buckets.data_v2 openstack-store_ecprofile_hdd 3) set pool crush rule to new one: ceph osd pool set crush_rule openstack-store.rgw.buckets.data openstack-store.rgw.buckets.data_v2 4) check that replication is finished and status healthy: ceph -s 5) in case of any problems revert step 2 and stop procedure: ceph osd pool set crush_rule openstack-store.rgw.buckets.data_v2 openstack-store.rgw.buckets.data 6) remove standard (old) pool crush rule: ceph osd crush rule rm openstack-store.rgw.buckets.data 7) rename new pool crush rule to standard name: ceph osd crush rule rename openstack-store.rgw.buckets.data_v2 openstack-store.rgw.buckets.data 8) remove standard (old) erasure-coded profile: ceph osd erasure-code-profile rm openstack-store_ecprofile
Verify that the Ceph cluster has rebalanced and has the
HEALTH_OK
status:ceph -s
Exit the
ceph-tools
Pod.
For CephFS, execute the
rule-helper
script to output the step-by-step instruction and run each step provided in the output manually.Execution of the
rule-helper
script steps for CephFSGet a shell of the
ceph-tools
Pod:kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
Run the
/tmp/rule-helper.py
script with the following parameters:python3 /tmp/rule-helper.py --prefix <cephfsName> --type cephfs --device-class <deviceClass>
Substitute the following parameters:
<cephfsName>
with CephFS name fromspec.cephClusterSpec.sharedFilesystem.cephFS[0].name
in theKaaSCephCluster
object. In the example above, the name iscephfs-store
.<deviceClass>
with the device class selected in the previous steps.
Using the output of the command from the previous step, run manual commands step-by-step.
Example output for the
hdd
device class:Found 3 rules require device class set - Pool cephfs-store-metadata requires crush rule update, device class is not set 1) create a new replicated rule for pool: ceph osd crush create-replicated cephfs-store-metadata_v2 default host ssd 2) set pool crush rule to new one: ceph osd pool set crush_rule cephfs-store-metadata cephfs-store-metadata_v2 3) check that replication is finished and status healthy: ceph -s 4) in case of any problems revert step 2 and stop procedure: ceph osd pool set crush_rule cephfs-store-metadata_v2 cephfs-store-metadata 5) remove standard (old) pool crush rule: ceph osd crush rule rm cephfs-store-metadata 6) rename new pool crush rule to standard name: ceph osd crush rule rename cephfs-store-metadata_v2 cephfs-store-metadata - Pool cephfs-store-default-pool requires crush rule update, device class is not set 1) create a new replicated rule for pool: ceph osd crush create-replicated cephfs-store-default-pool_v2 default host ssd 2) set pool crush rule to new one: ceph osd pool set crush_rule cephfs-store-default-pool cephfs-store-default-pool_v2 3) check that replication is finished and status healthy: ceph -s 4) in case of any problems revert step 2 and stop procedure: ceph osd pool set crush_rule cephfs-store-default-pool_v2 cephfs-store-default-pool 5) remove standard (old) pool crush rule: ceph osd crush rule rm cephfs-store-default-pool 6) rename new pool crush rule to standard name: ceph osd crush rule rename cephfs-store-default-pool_v2 cephfs-store-default-pool - Pool cephfs-store-second-pool requires crush rule update, device class is not set 1) create a new erasure-coded profile: ceph osd erasure-code-profile set cephfs-store-second-pool_ecprofile_ssd crush-device-class=ssd crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=2 m=1 plugin=jerasure technique=reed_sol_van w=8 2) create a new erasure-coded rule for pool: ceph osd crush create-erasure cephfs-store-second-pool_v2 cephfs-store-second-pool_ecprofile_ssd 3) set pool crush rule to new one: ceph osd pool set crush_rule cephfs-store-second-pool cephfs-store-second-pool_v2 4) check that replication is finished and status healthy: ceph -s 5) in case of any problems revert step 2 and stop procedure: ceph osd pool set crush_rule cephfs-store-second-pool_v2 cephfs-store-second-pool 6) remove standard (old) pool crush rule: ceph osd crush rule rm cephfs-store-second-pool 7) rename new pool crush rule to standard name: ceph osd crush rule rename cephfs-store-second-pool_v2 cephfs-store-second-pool 8) remove standard (old) erasure-coded profile: ceph osd erasure-code-profile rm cephfs-store-second-pool_ecprofile
Verify that the Ceph cluster has rebalanced and has the
HEALTH_OK
status:ceph -s
Exit the
ceph-tools
Pod.
Verify the
pg_autoscaler
module after switchingdeviceClass
for all required pools:ceph osd pool autoscale-status
The system response must contain all Ceph RGW and CephFS pools.
On the management cluster, edit the
KaaSCephCluster
object of the corresponding managed cluster by adding the selected device class to thedeviceClass
parameter of the updated Ceph RGW and CephFS pools:kubectl -n <managedClusterProjectName> edit kaascephcluster
Example configuration
spec: cephClusterSpec: objectStorage: rgw: dataPool: failureDomain: host deviceClass: <rgwDeviceClass> erasureCoded: codingChunks: 1 dataChunks: 2 metadataPool: failureDomain: host deviceClass: <rgwDeviceClass> replicated: size: 3 gateway: allNodes: false instances: 3 port: 80 securePort: 8443 name: openstack-store preservePoolsOnDelete: false ... sharedFilesystem: cephFS: - name: cephfs-store dataPools: - name: default-pool deviceClass: <cephfsDeviceClass> replicated: size: 3 failureDomain: host - name: second-pool deviceClass: <cephfsDeviceClass> erasureCoded: dataChunks: 2 codingChunks: 1 metadataPool: deviceClass: <cephfsDeviceClass> replicated: size: 3 failureDomain: host ...
Substitute
<rgwDeviceClass>
with the device class applied to Ceph RGW pools and<cephfsDeviceClass>
with the device class applied to CephFS pools.You can use this configuration step for further management of Ceph RGW and/or CephFS. It does not impact the existing Ceph cluster configuration.
[26441] Cluster update fails with the ‘MountDevice failed for volume’ warning¶
Update of a managed cluster based on bare metal and Ceph enabled fails with
PersistentVolumeClaim
getting stuck in the Pending
state for the
prometheus-server
StatefulSet and the
MountVolume.MountDevice failed for volume
warning in the StackLight event
logs.
Workaround:
Verify that the description of the Pods that failed to run contain the
FailedMount
events:kubectl -n <affectedProjectName> describe pod <affectedPodName>
In the command above, replace the following values:
<affectedProjectName>
is the Container Cloud project name where the Pods failed to run<affectedPodName>
is a Pod name that failed to run in the specified project
In the Pod description, identify the node name where the Pod failed to run.
Verify that the
csi-rbdplugin
logs of the affected node contain therbd volume mount failed: <csi-vol-uuid> is being used
error. The<csi-vol-uuid>
is a unique RBD volume name.Identify
csiPodName
of the correspondingcsi-rbdplugin
:kubectl -n rook-ceph get pod -l app=csi-rbdplugin \ -o jsonpath='{.items[?(@.spec.nodeName == "<nodeName>")].metadata.name}'
Output the affected
csiPodName
logs:kubectl -n rook-ceph logs <csiPodName> -c csi-rbdplugin
Scale down the affected
StatefulSet
orDeployment
of the Pod that fails to0
replicas.On every
csi-rbdplugin
Pod, search for stuckcsi-vol
:for pod in `kubectl -n rook-ceph get pods|grep rbdplugin|grep -v provisioner|awk '{print $1}'`; do echo $pod kubectl exec -it -n rook-ceph $pod -c csi-rbdplugin -- rbd device list | grep <csi-vol-uuid> done
Unmap the affected
csi-vol
:rbd unmap -o force /dev/rbd<i>
The
/dev/rbd<i>
value is a mapped RBD volume that usescsi-vol
.Delete
volumeattachment
of the affected Pod:kubectl get volumeattachments | grep <csi-vol-uuid> kubectl delete volumeattacmhent <id>
Scale up the affected
StatefulSet
orDeployment
back to the original number of replicas and wait until its state becomesRunning
.
StackLight¶
[31485] Elasticsearch Curator does not delete indices as per retention period¶
Note
If you obtain patch releases, the issue is addressed in 2.23.2 for management and regional clusters and in 11.7.1 and 12.7.1 for managed clusters.
Elasticsearch Curator does not delete any indices according to the configured retention period on any type of Container Cloud clusters.
To verify whether your cluster is affected:
Identify versions of Cluster releases installed on your clusters:
kubectl get cluster --all-namespaces \
-o custom-columns=CLUSTER:.metadata.name,NAMESPACE:.metadata.namespace,VERSION:.spec.providerSpec.value.release
The following list contains all affected Cluster releases:
mke-11-7-0-3-5-7
mke-13-4-4
mke-13-5-3
mke-13-6-0
mke-13-7-0
mosk-12-7-0-23-1
As a workaround, on the affected clusters, create a temporary CronJob for
elasticsearch-curator
to clean the required indices:
kubectl get cronjob elasticsearch-curator -n stacklight -o json \
| sed 's/5.7.6-[0-9]*/5.7.6-20230404082402/g' \
| jq '.spec.schedule = "30 * * * *"' \
| jq '.metadata.name = "temporary-elasticsearch-curator"' \
| jq 'del(.metadata.resourceVersion,.metadata.uid,.metadata.selfLink,.metadata.creationTimestamp,.metadata.annotations,.metadata.generation,.metadata.ownerReferences,.metadata.labels,.spec.jobTemplate.metadata.labels,.spec.jobTemplate.spec.template.metadata.creationTimestamp,.spec.jobTemplate.spec.template.metadata.labels)' \
| jq '.metadata.labels.app = "temporary-elasticsearch-curator"' \
| jq '.spec.jobTemplate.metadata.labels.app = "temporary-elasticsearch-curator"' \
| jq '.spec.jobTemplate.spec.template.metadata.labels.app = "temporary-elasticsearch-curator"' \
| kubectl create -f -
Note
This CronJob is removed automatically during upgrade to the major Container Cloud release 2.24.0 or to the patch Container Cloud release 2.23.3 if you obtain patch releases.