Failure of shard relocation in the OpenSearch cluster¶
On large managed clusters, shard relocation may fail in the OpenSearch cluster
with the yellow
or red
status of the OpenSearch cluster.
The characteristic symptom of the issue is that in the stacklight
namespace, the statefulset.apps/opensearch-master
containers are
experiencing throttling with the KubeContainersCPUThrottlingHigh
alert
firing for the following set of labels:
{created_by_kind="StatefulSet",created_by_name="opensearch-master",namespace="stacklight"}
Caution
The throttling that OpenSearch is experiencing may be a temporary situation, which may be related, for example, to a peaky load and the ongoing shards initialization as part of disaster recovery or after node restart. In this case, Mirantis recommends waiting until initialization of all shards is finished. After that, verify the cluster state and whether throttling still exists. And only if throttling does not disappear, apply the workaround below.
To verify that the initialization of shards is ongoing:
kubectl exec -it pod/opensearch-master-0 -n stacklight -c opensearch -- bash
curl "http://localhost:9200/_cat/shards" | grep INITIALIZING
Example of system response:
.ds-system-000072 2 r INITIALIZING 10.232.182.135 opensearch-master-1
.ds-system-000073 1 r INITIALIZING 10.232.7.145 opensearch-master-2
.ds-system-000073 2 r INITIALIZING 10.232.182.135 opensearch-master-1
.ds-audit-000001 2 r INITIALIZING 10.232.7.145 opensearch-master-2
The system response above indicates that shards from the
.ds-system-000072
, .ds-system-000073
, and .ds-audit-000001
indicies are in the INITIALIZING
state. In this case, Mirantis
recommends waiting until this process is finished, and only then consider
changing the limit.
You can additionally analyze the exact level of throttling and the current CPU usage on the Kubernetes Containers dashboard in Grafana.
To apply the issue resolution:
Verify the currently configured CPU requests and limits for the
opensearch
containers:kubectl -n stacklight get statefulset.apps/opensearch-master -o jsonpath="{.spec.template.spec.containers[?(@.name=='opensearch')].resources}"
Example of system response:
{"limits":{"cpu":"600m","memory":"8Gi"},"requests":{"cpu":"500m","memory":"6Gi"}}
In the example above, the CPU request is
500m
and the CPU limit is600m
.Increase the CPU limit to a reasonably high number.
For example, the default CPU limit for the clusters with the
clusterSize:large
parameter set was increased from8000m
to12000m
for StackLight in Container Cloud 2.27.0 (Cluster releases 17.2.0 and 16.2.0).Note
For details, on the
clusterSize
parameter, see Operations Guide: StackLight configuration parameters - Cluster size.If the defaults are already overridden on the affected cluster using the
resourcesPerClusterSize
orresources
parameters as described in Operations Guide: StackLight configuration parameters - Resource limits, then the exact recommended number depends on the currently set limit.Mirantis recommends increasing the limit by 50%. If it does not resolve the issue, another increase iteration will be required.
When you select the required CPU limit, increase it as described in Operations Guide: StackLight configuration parameters - Resource limits.
If the CPU limit for the
opensearch
component is already set, increase it in theCluster
object for theopensearch
parameter. Otherwise, the default StackLight limit is used. In this case, increase the CPU limit for theopensearch
component using theresources
parameter.Wait until all
opensearch-master
pods are recreated with the new CPU limits and becomerunning
andready
.To verify the current CPU limit for every
opensearch
container in everyopensearch-master
pod separately:kubectl -n stacklight get pod/opensearch-master-<podSuffixNumber> -o jsonpath="{.spec.containers[?(@.name=='opensearch')].resources}"
In the command above, replace
<podSuffixNumber>
with the name of the pod suffix. For example,pod/opensearch-master-0
orpod/opensearch-master-2
.Example of system response:
{"limits":{"cpu":"900m","memory":"8Gi"},"requests":{"cpu":"500m","memory":"6Gi"}}
The waiting time may take up to 20 minutes depending on the cluster size.
If the issue is fixed, the KubeContainersCPUThrottlingHigh
alert stops
firing immediately, while OpenSearchClusterStatusWarning
or
OpenSearchClusterStatusCritical
can still be firing for some time during
shard relocation.
If the KubeContainersCPUThrottlingHigh
alert is still firing, proceed with
another iteration of the CPU limit increase.