Troubleshoot OpenSearch alerts

Available since 2.26.0 (17.1.0 and 16.1.0)

This section describes the investigation and troubleshooting steps for the OpenSearch alerts.


OpenSearchStorageUsageCritical

Root cause

The OpenSearch volume has reached the default flood_stage disk allocation watermark of 95% disk usage. At this stage, all shards are in read-only mode.

Investigation and mitigation

  1. Important. Allow deleting read-only shards. For details, see the step 3 of the “Temporary hacks/fixes” section in Opster documentation: Flood stage disk watermark exceeded on all indices on this node will be marked read-only.

  2. Consider applying temporary fixes from the same article to allow logs flow until you fix the main issue.

  3. Refer to the Investigation and mitigation section in OpenSearchStorageUsageMajor .

OpenSearchStorageUsageMajor

Root cause

The OpenSearch volume has reached the default value for the high disk allocation watermark of 90% disk usage. At this point, OpenSearch attempts to reassign shards to other nodes if these nodes are still under 90% of used disk space.

Investigation and mitigation

  1. Verify that the user does not create indices that are not managed by StackLight, which may also cause unexpected storage usage. StackLight deletes old data only for its managed indices.

  2. If an OpenSearch volume uses shared storage, such as LVP, disk usage may still exceed expected limits even if rotation works as expected. In this case, consider the following solutions:

    • Increase disk space

    • Delete old indices

    • Lower retention thresholds for components that use shared storage. To reduce OpenSearch space usage, consider adjusting the elasticsearch.persistentVolumeUsableStorageSizeGB parameter.

  3. By default, elasticsearch-curator deletes old logs when disk usage exceeds 80%. If it fails to delete old logs, inspect the known issues described in the product Release Notes.