Cleanse etcd of Kubernetes events

Kubernetes events are generated in response to changes within Kubernetes resources, such as nodes, Pods, or containers. These events are created with a time to live (TTL), after which they are automatically cleaned up. Should it happen, however, that a large amount of Kubernetes events are generated or other cluster issues arise, it may be necessary to manually clean up the Kubernetes events to prevent etcd from exceeding its quota. MKE offers an API that you can use to directly clean up event objects within your cluster, with which you can specify whether all events should be deleted or only those that have a certain TTL.

Note

The etcd cleanup API is a preventative measure only. If etcd already exceeds the established quota MKE may no longer be operational, and as a result the API will not work.

To trigger etcd cleanup:

  1. Issue a POST to the https://MKE_HOST/api/ucp/etcd/cleanup endpoint.

    You can specify two parameters:

    dryRun

    Sets where to issue a dry cleanup run instead of the production run. A dry run returns a list of etcd keys (Kubernetes events) that will be deleted without actually deleting them. Defaults to false.

    MinTTLToKeepSeconds

    Sets the minimum TTL to retain, meaning that only events with a lower TTL are deleted. By default, all events are deleted regardless of TTL.

    Mirantis recommends that you adjust these parameters based on the size of the etcd database and the amount of time that has elapsed since the last cleanup.

    Example command (dry run):

     AUTHTOKEN=$(curl --silent --insecure --data '{"username":"<username>","password":"<password>"}' <https://MKE_HOST/auth/login> | jq --raw-output .auth_token)
    
    curl --insecure -H "Authorization: Bearer $AUTHTOKEN" <https://MKE_HOST/api/ucp/etcd/cleanup> --data '{"dryRun": true}'
    

    Command response (dry run):

    [
        {
            "key": "/registry/events/default/eventkey1",
            "ttl": 3638
        },
        {
            "key": "/registry/events/default/eventkey2",
            "ttl": 3639
        }
        ...
    ]
    

    Example command (live):

    AUTHTOKEN=$(curl --silent --insecure --data '{"username":"<username>","password":"<password>"}' <https://MKE_HOST/auth/login> | jq --raw-output .auth_token)
    
    curl --insecure -H "Authorization: Bearer $AUTHTOKEN" <https://MKE_HOST/api/ucp/etcd/cleanup> --data '{"dryRun": false}'
    

    Example response (live):

    "Etcd Cleanup Initiated"
    
  2. Review the etcd cleanup state:

    Example command:

    AUTHTOKEN=$(curl --silent --insecure --data '{"username":"<username>","password":"<password>"}' <https://MKE_HOST/auth/login> | jq --raw-output .auth_token)
    
    curl --insecure -H "Authorization: Bearer $AUTHTOKEN" <https://MKE_HOST/api/ucp/etcd/info>
    

    Example response:

    {
        "CleanupInProgress": false,
        "CleanupResult": "Cluster Cleanup finished & Revisions Compacted. Issue a cluster defrag to permanently clear up space.",
        "DefragInProgress": false,
        "DefragResult": "",
        "MemberInfo": [
            {
                "MemberID": 16494148364752423721,
                "Endpoint": "<https://172.31.47.35:12379",>
                "EtcdVersion": "3.5.6",
                "DbSize": "1 MB",
                "IsLeader": true,
                "Alarms": null
            }
        ]
    }
    

The CleanupResult field in the response indicates any issues that arise. It also indicates when the cleanup is finished.

Note

Although the etcd cleanup process deletes the keys, you must run an etcd defragmentation to release the storage space used by those keys. The defragmentation is a blocking operation, and as such it is not run automatically but must be run in order for the cleanup to release space back to the filesystem.