Troubleshoot high OpenSearch indexing pressure¶
High indexing pressure may be caused by a large amount of log data being
ingested into OpenSearch, for example, after enabling the debug logging
level or during a disaster recovery.
By default, OpenSearch reserves only 10% of its heap size for indexing
workloads. This allocation may be insufficient in some scenarios, potentially
leading to Yellow or Red cluster states during recoveries or triggering
the OpenSearchIndexingPressureHigh alert.
To verify whether the cluster is affected by high indexing pressure:
Verify whether the cluster has shard allocation issues:
kubectl exec -it opensearch-master-0 -n stacklight -- curl localhost:9200/_cluster/allocation/explain | jq
Example of a system response extract with shard allocation issues:
{ "explanation": "Shard has exceeded the maximum number of retries [5] on failed allocation attempts. Manually call [/_cluster/reroute?retry_failed=true] to retry. Nested exception: OpenSearchRejectedExecutionException: rejected execution of primary operation [coordinating_and_primary_bytes=5347688872, replica_bytes=56476234, all_bytes=5404165106, primary_operation_bytes=0, max_coordinating_and_primary_bytes=5368709120]. Allocation status: no_attempt." }
Verify live stats of indexing pressure with historical counters:
curl -s "localhost:9200/_nodes/stats?human=true&filter_path=nodes.*.indexing_pressure" | jq
In the system response, compare the
limitstat with thecurrentone (all memory used) and verify whether the*_rejectionscounters increase over time.Example of a system response extract with high indexing pressure
{ "nodes": { "9vzke53QQK-7aoJilvaYTQ": { "indexing_pressure": { "memory": { "current": { "combined_coordinating_and_primary": "176.2mb", "combined_coordinating_and_primary_in_bytes": 184858973, "coordinating": "176.2mb", "coordinating_in_bytes": 184858973, "primary": "0b", "primary_in_bytes": 0, "replica": "182.6mb", "replica_in_bytes": 191486852, "all": "358.9mb", "all_in_bytes": 376345825 }, "total": { "combined_coordinating_and_primary": "650.2gb", "combined_coordinating_and_primary_in_bytes": 698151026756, "coordinating": "430.8gb", "coordinating_in_bytes": 462673038029, "primary": "329.9gb", "primary_in_bytes": 354234352171, "replica": "209.2gb", "replica_in_bytes": 224712024140, "all": "859.4gb", "all_in_bytes": 922863050896, "coordinating_rejections": 0, "primary_rejections": 2, "replica_rejections": 1 }, "limit": "1.6gb", "limit_in_bytes": 1825361100 } } } } }
Sometimes, even if the indexing pressure is already resolved and the
OpenSearchIndexingPressureHigh alert is no longer firing, some shards may
still be in the RELOCATING state due to reaching shard recovery limit. In
this case, you may need to manually force shard relocation:
kubectl exec -it opensearch-master-0 -n stacklight -- curl -X POST localhost:9200/_cluster/reroute?retry_failed=true
To apply the issue resolution:
You can either decrease the intensity of logging or, if resources are sufficient, increase the memory limit or the default allocation limit for indexing workloads. Select one of the following options:
If the
debuglogging level is enabled and no longer required, disable it. For configuration reference, use the following resources for various MOSK components:MKE: MKE documentation: Configuration options - log_configuration
OpenStack: Reference Architecture: OpenStackDeployment custom resource - osdpl-cr-logging
OpenSDN: Deployment Guide: Enable debug logs for the OpenSDN services
Bare metal provider: Operations Guide: Enable log debugging
StackLight: Operations Guide: StackLight configuration parameters - Log verbosity
If the
debuglogging level is enabled and still required, or if indexing pressure is high due to disaster recovery, increase the memory limit for theopensearch-masterStatefulSet. For details, see Resource limits.Carefully examine the resource consumption and contact Mirantis Support to consider increasing the default 10% allocation limit for indexing workloads.
To examine the resource consumption, use the historical counters of the JVM Memory Usage panel of the Dashboards > OpenSearch > Resources dashboard in Grafana.
Only if the JVM resources are sufficient and almost never used, consider increasing the default allocation limit for indexing workloads to maximum 20% by adjusting
indexing_pressure.memory.limitin thelogging.extraConfigsection of the StackLight configuration. For example:logging: extraConfig: indexing_pressure.memory.limit: 15%
Warning
This change initiates a full rollout restart of the OpenSearch cluster.
For the
logging.extraConfigdescription, see OpenSearch extra settings.