Due to the upstream MetalLB issue,
a race condition occurs when assigning an IP address after the MetalLB
controller restart. If a new service of the LoadBalancer type is created
during the MetalLB Controller restart, then this service can be assigned an IP
address that was already assigned to another service before the MetalLB
Controller restart.
To verify that the cluster is affected:
Verify whether IP addresses of the LoadBalancer (LB) type are duplicated
where they are not supposed to:
kubectlgetsvc-A|grepLoadBalancer
Note
Some services use shared IP addresses on purpose. In the example
system response below, these are services using the IP address 10.0.1.141.
In the above example, the iam-keycloak-http and kaas-kaas-ui
services erroneously use the same IP address 10.100.91.101. They both use the
same port 443 producing a collision when an application tries to access the
10.100.91.101:443 endpoint.
Workaround:
Unassign the current LB IP address for the selected service, as no LB IP
address can be used for the NodePort service:
The second affected service will continue using its current LB IP address.
[24005] Deletion of a node with ironic Pod is stuck in the Terminating state¶
During deletion of a manager machine running the ironic Pod from a bare
metal management cluster, the following problems occur:
All Pods are stuck in the Terminating state
A new ironic Pod fails to start
The related bare metal host is stuck in the deprovisioning state
As a workaround, before deletion of the node running the ironic Pod,
cordon and drain the node using the kubectl cordon <nodeName> and
kubectl drain <nodeName> commands.
[20736] Region deletion failure after regional deployment failure¶
If a baremetal-based regional cluster deployment fails before pivoting is
done, the corresponding region deletion fails.
Workaround:
Using the command below, manually delete all possible traces of the failed
regional cluster deployment, including but not limited to the following
objects that contain the kaas.mirantis.com/region label of the affected
region:
Upgrade of a cluster with more than 120 nodes gets stuck with errors about
IP addresses exhaustion in the docker logs.
Note
If you plan to scale your cluster to more than 120 nodes, the cluster
will be affected by the issue. Therefore, you will have to perform the
workaround below.
Workaround:
Caution
If you have not run the cluster upgrade yet, simply recreate the
mke-overlay network as described in the step 6 and skip all other steps.
Note
If you successfully upgraded the cluster with less than 120 nodes
but plan to scale it to more that 120 node, proceed with steps 2-9.
Verify that MKE nodes are upgraded:
On any master node, run the following command to identify
ucp-worker-agent that has a newer version:
Recreate the mke-overlay network with a correct CIDR that must be
at least /20 and have no interventions with other subnets in the
cluster network. For example:
During the unsafe or forced deletion of a manager machine running the
calico-kube-controllers Pod in the kube-system namespace,
the following issues occur:
The calico-kube-controllers Pod fails to clean up resources associated
with the deleted node
The calico-node Pod may fail to start up on a newly created node if the
machine is provisioned with the same IP address as the deleted machine had
As a workaround, before deletion of the node running the
calico-kube-controllers Pod, cordon and drain the node:
kubectlcordon<nodeName>
kubectldrain<nodeName>
[30294] Replacement of a master node is stuck on the calico-node Pod start¶
During replacement of a master node on a cluster of any type, the
calico-node Pod fails to start on a new node that has the same IP address
as the node being replaced.
Workaround:
Log in to any master node.
From a CLI with an MKE client bundle, create a shell alias to start
calicoctl using the mirantis/ucp-dsinfo image:
In the above command, replace the following values with the corresponding
settings of the affected cluster:
<etcdEndpoint> is the etcd endpoint defined in the Calico
configuration file. For example, ETCD_ENDPOINTS=127.0.0.1:12378
<mkeVersion> is the MKE version installed on your cluster.
For example, mirantis/ucp-dsinfo:3.5.7.
Verify the node list on the cluster:
kubectlgetnode
Compare this list with the node list in Calico to identify the old node:
calicoctlgetnode-owide
Remove the old node from Calico:
calicoctldeletenodekaas-node-<nodeID>
[27797] A cluster ‘kubeconfig’ stops working during MKE minor version update¶
During update of a Container Cloud cluster of any type, if the MKE minor
version is updated from 3.4.x to 3.5.x, access to the cluster using the
existing kubeconfig fails with the You must be logged in to the server
(Unauthorized) error due to OIDC settings being reconfigured.
As a workaround, during the cluster update process, use the adminkubeconfig instead of the existing one. Once the update completes, you can
use the existing cluster kubeconfig again.
When setting a new Transport Layer Security (TLS) certificate for a cluster,
the false positive failedtogetkubeconfig error may occur on the
WaitingforTLSsettingstobeapplied stage. No actions are required.
Therefore, disregard the error.
To verify the status of the TLS configuration being applied:
In this example, expirationTime equals the NotAfter field of the
server certificate. And the value of hostname contains the configured
application name.
The deployment of Ceph OSDs fails with the following messages in the status
section of the KaaSCephCluster custom resource:
shortClusterInfo:messages:-Not all osds are deployed-Not all osds are in-Not all osds are up
To find out if your cluster is affected, verify if the devices on
the AMD hosts you use for the Ceph OSDs deployment are removable.
For example, if the sdb device name is specified in
spec.cephClusterSpec.nodes.storageDevices of the KaaSCephCluster
custom resource for the affected host, run:
# cat /sys/block/sdb/removable1
The system output shows that the reason of the above messages in status
is the enabled hotplug functionality on the AMD nodes, which marks all drives
as removable. And the hotplug functionality is not supported by Ceph in
Container Cloud.
As a workaround, disable the hotplug functionality in the BIOS settings
for disks that are configured to be used as Ceph OSD data devices.
[30635] Ceph ‘pg_autoscaler’ is stuck with the ‘overlapping roots’ error¶
Due to the upstream Ceph issue
occurring since Ceph Pacific, the pg_autoscaler module of Ceph Manager
fails with the pool <poolNumber> has overlapping roots error if a Ceph
cluster contains a mix of pools with deviceClass either explicitly
specified or not specified.
The deviceClass parameter is required for a pool definition in the
spec section of the KaaSCephCluster object, but not required for Ceph
RADOS Gateway (RGW) and Ceph File System (CephFS).
Therefore, if sections for Ceph RGW or CephFS data or metadata pools are
defined without deviceClass, then autoscaling of placement groups is
disabled on a cluster due to overlapping roots. Overlapping roots imply that
Ceph RGW and/or CephFS pools obtained the default crush rule and have no
demarcation on a specific class to store data.
Note
If pools for Ceph RGW and CephFS already have deviceClass
specified, skip the corresponding steps of the below procedure.
Note
Perform the below procedure on the affected managed cluster using
its kubeconfig.
Workaround:
Obtain failureDomain and required replicas for Ceph RGW and/or CephFS
pools:
Note
If the KaasCephClusterspec section does not contain
failureDomain, failureDomain equals host by default to store
one replica per node.
Note
The types of pools crush rules include:
An erasureCoded pool requires the codingChunks+dataChunks
number of available units of failureDomain.
A replicated pool requires the replicated.size number of
available units of failureDomain.
To obtain Ceph RGW pools, use the
spec.cephClusterSpec.objectStorage.rgw section of the
KaaSCephCluster object. For example:
The dataPool pool requires the sum of codingChunks and
dataChunks values representing the number of available units of
failureDomain. In the example above, for failureDomain:host,
dataPool requires 3 available nodes to store its objects.
The metadataPool pool requires the replicated.size number
of available units of failureDomain. For failureDomain:host,
metadataPool requires 3 available nodes to store its objects.
To obtain CephFS pools, use the
spec.cephClusterSpec.sharedFilesystem.cephFS section of the
KaaSCephCluster object. For example:
The default-pool and metadataPool pools require the
replicated.size number of available units of failureDomain.
For failureDomain:host, default-pool requires 3 available
nodes to store its objects.
The second-pool pool requires the sum of codingChunks and
dataChunks representing the number of available units of
failureDomain. For failureDomain:host, second-pool requires
3 available nodes to store its objects.
Obtain the device class that meets the desired number of required replicas
for the defined failureDomain.
Summarize the USED size of all <rgwName>.rgw.* pools and
compare it with the AVAIL size of each applicable device class
selected in the previous step.
Note
As Ceph RGW pools lack explicit specification of
deviceClass, they may store objects on all device classes.
The resulted device size can be smaller than the calculated
USED size because part of data can already be stored in the
desired class.
Therefore, limiting pools to a single device class may result in a
smaller occupied data size than the total USED size.
Nonetheless, calculating the USED size of all pools remains
valid because the pool data may not be stored on the selected device
class.
For CephFS data or metadata pools, use the previous step to calculate
the USED size of pools and compare it with the AVAIL size.
Decide which device class from applicable by required replicas and
available size is more preferable to store Ceph RGW and CephFS data.
In the example output above, hdd and ssd are both applicable.
Therefore, select any of them.
Note
You can select different device classes for Ceph RGW and
CephFS. For example, hdd for Ceph RGW and ssd for CephFS.
Select a device class based on performance expectations, if any.
Create the rule-helper script to switch Ceph RGW or CephFS pools to a
device usage.
Create the /tmp/rule-helper.py file with the following content:
cat>/tmp/rule-helper.py<< EOFimport argparseimport jsonimport subprocessfrom sys import argv, exitdef get_cmd(cmd_args): output_args = ['--format', 'json'] _cmd = subprocess.Popen(cmd_args + output_args, stdout=subprocess.PIPE, stderr=subprocess.PIPE) stdout, stderr = _cmd.communicate() if stderr: error = stderr print("[ERROR] Failed to get '{0}': {1}".format(cmd_args.join(' '), stderr)) return return stdoutdef format_step(action, cmd_args): return "{0}:\n\t{1}".format(action, ' '.join(cmd_args))def process_rule(rule): steps = [] new_rule_name = rule['rule_name'] + '_v2' if rule['type'] == "replicated": rule_create_args = ['ceph', 'osd', 'crush', 'create-replicated', new_rule_name, rule['root'], rule['failure_domain'], rule['device_class']] steps.append(format_step("create a new replicated rule for pool", rule_create_args)) else: new_profile_name = rule['profile_name'] + '_' + rule['device_class'] profile_create_args = ['ceph', 'osd', 'erasure-code-profile', 'set', new_profile_name] for k,v in rule['profile'].items(): profile_create_args.append("{0}={1}".format(k,v)) rule_create_args = ['ceph', 'osd', 'crush', 'create-erasure', new_rule_name, new_profile_name] steps.append(format_step("create a new erasure-coded profile", profile_create_args)) steps.append(format_step("create a new erasure-coded rule for pool", rule_create_args)) set_rule_args = ['ceph', 'osd', 'pool', 'set', 'crush_rule', rule['pool_name'], new_rule_name] revert_rule_args = ['ceph', 'osd', 'pool', 'set', 'crush_rule', new_rule_name, rule['pool_name']] rm_old_rule_args = ['ceph', 'osd', 'crush', 'rule', 'rm', rule['rule_name']] rename_rule_args = ['ceph', 'osd', 'crush', 'rule', 'rename', new_rule_name, rule['rule_name']] steps.append(format_step("set pool crush rule to new one", set_rule_args)) steps.append("check that replication is finished and status healthy: ceph -s") steps.append(format_step("in case of any problems revert step 2 and stop procedure", revert_rule_args)) steps.append(format_step("remove standard (old) pool crush rule", rm_old_rule_args)) steps.append(format_step("rename new pool crush rule to standard name", rename_rule_args)) if rule['type'] != "replicated": rm_old_profile_args = ['ceph', 'osd', 'erasure-code-profile', 'rm', rule['profile_name']] steps.append(format_step("remove standard (old) erasure-coded profile", rm_old_profile_args)) for idx, step in enumerate(steps): print(" {0}) {1}".format(idx+1, step))def check_rules(args): extra_pools_lookup = [] if args.type == "rgw": extra_pools_lookup.append(".rgw.root") pools_str = get_cmd(['ceph', 'osd', 'pool', 'ls', 'detail']) if pools_str == '': return rules_str = get_cmd(['ceph', 'osd', 'crush', 'rule', 'dump']) if rules_str == '': return try: pools_dump = json.loads(pools_str) rules_dump = json.loads(rules_str) if len(pools_dump) == 0: print("[ERROR] No pools found") return if len(rules_dump) == 0: print("[ERROR] No crush rules found") return crush_rules_recreate = [] for pool in pools_dump: if pool['pool_name'].startswith(args.prefix) or pool['pool_name'] in extra_pools_lookup: rule_id = pool['crush_rule'] for rule in rules_dump: if rule['rule_id'] == rule_id: recreate = False new_rule = {'rule_name': rule['rule_name'], 'pool_name': pool['pool_name']} for step in rule.get('steps',[]): root = step.get('item_name', '').split('~') if root[0] != '' and len(root) == 1: new_rule['root'] = root[0] continue failure_domain = step.get('type', '') if failure_domain != '': new_rule['failure_domain'] = failure_domain if new_rule.get('root', '') == '': continue new_rule['device_class'] = args.device_class if pool['erasure_code_profile'] == "": new_rule['type'] = "replicated" else: new_rule['type'] = "erasure" profile_str = get_cmd(['ceph', 'osd', 'erasure-code-profile', 'get', pool['erasure_code_profile']]) if profile_str == '': return profile_dump = json.loads(profile_str) profile_dump['crush-device-class'] = args.device_class new_rule['profile_name'] = pool['erasure_code_profile'] new_rule['profile'] = profile_dump crush_rules_recreate.append(new_rule) break print("Found {0} pools with crush rules require device class set".format(len(crush_rules_recreate))) for new_rule in crush_rules_recreate: print("- Pool {0} requires crush rule update, device class is not set".format(new_rule['pool_name'])) process_rule(new_rule) except Exception as err: print("[ERROR] Failed to get info from Ceph: {0}".format(err)) returnif __name__ == '__main__': parser = argparse.ArgumentParser( description='Ceph crush rules checker. Specify device class and service name.', prog=argv[0], usage='%(prog)s [options]') parser.add_argument('--type', type=str, help='Type of pool: rgw, cephfs', default='', required=True) parser.add_argument('--prefix', type=str, help='Pool prefix. If objectstore - use objectstore name, if CephFS - CephFS name.', default='', required=True) parser.add_argument('--device-class', type=str, help='Device class to switch on.', required=True) args = parser.parse_args() if len(argv) < 3: parser.print_help() exit(0) check_rules(args)EOF
Exit the ceph-tools Pod.
For Ceph RGW, execute the rule-helper script to output the step-by-step
instruction and run each step provided in the output manually.
Note
The following steps include creation of crush rules with the same
parameters as before but with the device class specification and switching
of pools to new crush rules.
Execution of the rule-helper script steps for Ceph RGW
<rgwName> with the Ceph RGW name from
spec.cephClusterSpec.objectStorage.rgw.name in the
KaaSCephCluster object. In the example above, the name is
openstack-store.
<deviceClass> with the device class selected in the previous
steps.
Using the output of the command from the previous step, run manual
commands step-by-step.
<cephfsName> with CephFS name from
spec.cephClusterSpec.sharedFilesystem.cephFS[0].name in the
KaaSCephCluster object. In the example above, the name is
cephfs-store.
<deviceClass> with the device class selected in the previous
steps.
Using the output of the command from the previous step, run manual
commands step-by-step.
Verify that the Ceph cluster has rebalanced and has the HEALTH_OK
status:
ceph-s
Exit the ceph-tools Pod.
Verify the pg_autoscaler module after switching deviceClass for
all required pools:
cephosdpoolautoscale-status
The system response must contain all Ceph RGW and CephFS pools.
On the management cluster, edit the KaaSCephCluster object of the
corresponding managed cluster by adding the selected device class to the
deviceClass parameter of the updated Ceph RGW and CephFS pools:
Substitute <rgwDeviceClass> with the device class applied to Ceph RGW
pools and <cephfsDeviceClass> with the device class applied to CephFS
pools.
You can use this configuration step for further management of Ceph RGW
and/or CephFS. It does not impact the existing Ceph cluster configuration.
[26441] Cluster update fails with the MountDevice failed for volume warning¶
Update of a managed cluster based on bare metal and Ceph enabled fails with
PersistentVolumeClaim getting stuck in the Pending state for the
prometheus-server StatefulSet and the
MountVolume.MountDevicefailedforvolume warning in the StackLight event
logs.
Workaround:
Verify that the description of the Pods that failed to run contain the
FailedMount events:
In the command above, replace the following values:
<affectedProjectName> is the Container Cloud project name where
the Pods failed to run
<affectedPodName> is a Pod name that failed to run in the specified project
In the Pod description, identify the node name where the Pod failed to run.
Verify that the csi-rbdplugin logs of the affected node contain the
rbdvolumemountfailed:<csi-vol-uuid>isbeingused error.
The <csi-vol-uuid> is a unique RBD volume name.
Identify csiPodName of the corresponding csi-rbdplugin:
This CronJob is removed automatically during upgrade to the
major Container Cloud release 2.24.0 or to the patch Container Cloud release
2.23.3 if you obtain patch releases.