Troubleshoot high OpenSearch indexing pressure

High indexing pressure may be caused by a large amount of log data being ingested into OpenSearch, for example, after enabling the debug logging level or during a disaster recovery.

By default, OpenSearch reserves only 10% of its heap size for indexing workloads. This allocation may be insufficient in some scenarios, potentially leading to Yellow or Red cluster states during recoveries or triggering the OpenSearchIndexingPressureHigh alert.

To verify whether the cluster is affected by high indexing pressure:

Verify whether the cluster has shard allocation issues:

kubectl exec -it opensearch-master-0 -n stacklight -- curl localhost:9200/_cluster/allocation/explain | jq

Example of a system response extract with shard allocation issues:

{
  "explanation": "Shard has exceeded the maximum number of retries [5]
  on failed allocation attempts. Manually call [/_cluster/reroute?retry_failed=true]
  to retry. Nested exception: OpenSearchRejectedExecutionException: rejected
  execution of primary operation
  [coordinating_and_primary_bytes=5347688872, replica_bytes=56476234, all_bytes=5404165106, primary_operation_bytes=0, max_coordinating_and_primary_bytes=5368709120].
  Allocation status: no_attempt."
}

Verify live stats of indexing pressure with historical counters:

curl -s "localhost:9200/_nodes/stats?human=true&filter_path=nodes.*.indexing_pressure" | jq

In the system response, compare the limit stat with the current one (all memory used) and verify whether the *_rejections counters increase over time.

Sometimes, even if the indexing pressure is already resolved and the OpenSearchIndexingPressureHigh alert is no longer firing, some shards may still be in the RELOCATING state due to reaching shard recovery limit. In this case, you may need to manually force shard relocation:

kubectl exec -it opensearch-master-0 -n stacklight -- curl -X POST localhost:9200/_cluster/reroute?retry_failed=true

To resolve the issue:

You can either decrease the intensity of logging or, if resources are sufficient, increase the memory limit or the default allocation limit for indexing workloads. Select one of the following options:

If the debug logging level is enabled and no longer required, disable it. For configuration reference, use the following resources for various MOSK components:
- MKE: MKE documentation: Configuration options - log_configuration
- OpenStack: Reference Architecture: OpenStackDeployment custom resource - osdpl-cr-logging
- OpenSDN: Deployment Guide: Enable debug logs for the OpenSDN services
- Bare-metal provider: Operations Guide: Enable log debugging
- StackLight: Operations Guide: StackLight configuration parameters - Log verbosity
If the debug logging level is enabled and still required, or if indexing pressure is high due to disaster recovery, increase the memory limit for the opensearch-master StatefulSet. For details, see Resource limits.
Carefully examine the resource consumption and contact Mirantis Support to consider increasing the default 10% allocation limit for indexing workloads.

To examine the resource consumption, use the historical counters of the JVM Memory Usage panel of the Dashboards > OpenSearch > Resources dashboard in Grafana.

Only if the JVM resources are sufficient and almost never used, consider increasing the default allocation limit for indexing workloads to maximum 20% by adjusting indexing_pressure.memory.limit in the logging.extraConfig section of the StackLight configuration. For example:
```
logging:
  extraConfig:
    indexing_pressure.memory.limit: 15%
```
Warning

This change initiates a full rollout restart of the OpenSearch cluster.

For the logging.extraConfig description, see OpenSearch extra settings.