Troubleshoot high OpenSearch indexing pressure

High indexing pressure may be caused by a large amount of log data being ingested into OpenSearch, for example, after enabling the debug logging level or during a disaster recovery.

By default, OpenSearch reserves only 10% of its heap size for indexing workloads. This allocation may be insufficient in some scenarios, potentially leading to Yellow or Red cluster states during recoveries or triggering the OpenSearchIndexingPressureHigh alert.

To verify whether the cluster is affected by high indexing pressure:

  1. Verify whether the cluster has shard allocation issues:

    kubectl exec -it opensearch-master-0 -n stacklight -- curl localhost:9200/_cluster/allocation/explain | jq
    

    Example of a system response extract with shard allocation issues:

    {
      "explanation": "Shard has exceeded the maximum number of retries [5]
      on failed allocation attempts. Manually call [/_cluster/reroute?retry_failed=true]
      to retry. Nested exception: OpenSearchRejectedExecutionException: rejected
      execution of primary operation
      [coordinating_and_primary_bytes=5347688872, replica_bytes=56476234, all_bytes=5404165106, primary_operation_bytes=0, max_coordinating_and_primary_bytes=5368709120].
      Allocation status: no_attempt."
    }
    
  2. Verify live stats of indexing pressure with historical counters:

    curl -s "localhost:9200/_nodes/stats?human=true&filter_path=nodes.*.indexing_pressure" | jq
    

    In the system response, compare the limit stat with the current one (all memory used) and verify whether the *_rejections counters increase over time.

    Example of a system response extract with high indexing pressure
    {
      "nodes": {
        "9vzke53QQK-7aoJilvaYTQ": {
          "indexing_pressure": {
            "memory": {
              "current": {
                "combined_coordinating_and_primary": "176.2mb",
                "combined_coordinating_and_primary_in_bytes": 184858973,
                "coordinating": "176.2mb",
                "coordinating_in_bytes": 184858973,
                "primary": "0b",
                "primary_in_bytes": 0,
                "replica": "182.6mb",
                "replica_in_bytes": 191486852,
                "all": "358.9mb",
                "all_in_bytes": 376345825
              },
              "total": {
                "combined_coordinating_and_primary": "650.2gb",
                "combined_coordinating_and_primary_in_bytes": 698151026756,
                "coordinating": "430.8gb",
                "coordinating_in_bytes": 462673038029,
                "primary": "329.9gb",
                "primary_in_bytes": 354234352171,
                "replica": "209.2gb",
                "replica_in_bytes": 224712024140,
                "all": "859.4gb",
                "all_in_bytes": 922863050896,
                "coordinating_rejections": 0,
                "primary_rejections": 2,
                "replica_rejections": 1
              },
              "limit": "1.6gb",
              "limit_in_bytes": 1825361100
            }
          }
        }
      }
    }
    

Sometimes, even if the indexing pressure is already resolved and the OpenSearchIndexingPressureHigh alert is no longer firing, some shards may still be in the RELOCATING state due to reaching shard recovery limit. In this case, you may need to manually force shard relocation:

kubectl exec -it opensearch-master-0 -n stacklight -- curl -X POST localhost:9200/_cluster/reroute?retry_failed=true

To apply the issue resolution:

You can either decrease the intensity of logging or, if resources are sufficient, increase the memory limit or the default allocation limit for indexing workloads. Select one of the following options: