Configure Graceful Node Shutdown with kubelet node profiles¶

Available since MKE 3.7.12

To configure Graceful Node Shutdown grace periods in MKE cluster, set the following flags in the [cluster_config.custom_kubelet_flags_profiles] section of the MKE configuration file:

--shutdown-grace-period=0s
--shutdown-grace-period-critical-pods=0s

The GracefulNodeShutdown feature gate is enabled by default, with shutdown grace period parameters both set to 0s.

When you add your custom kubelet profiles, insert and set the GracefulNodeShutdown flags in the MKE configuration file. For example:

[cluster_config.custom_kubelet_flags_profiles]
  manager = "--shutdown-grace-period=30s --shutdown-grace-period-critical-pods=20s"
  worker = "--shutdown-grace-period=60s
  --shutdown-grace-period-critical-pods=50s"

Apply your kubelet node profiles.

From a labeled node with GracefulNodeShutdown enabled, verify that the inhibitor lock is taken by the kubelet:

systemd-inhibit --list
    Who: kubelet (UID 0/root, PID 337097/kubelet)
    What: shutdown
    Why: Kubelet needs time to handle node shutdown
    Mode: delay

1 inhibitors listed.

Troubleshooting¶

The Graceful Node Shutdown feature may present various issues.

Missing kubelet inhibitors and ucp-kubelet errors¶

A Graceful Node Shutdown configuration of --shutdown-grace-period=60s --shutdown-grace-period-critical-pods=50s can result in the following error message:

Failed to start node shutdown manager" err="node shutdown manager was unable
to update logind InhibitDelayMaxSec to 60s (ShutdownGracePeriod), current
value of InhibitDelayMaxSec (30s) is less than requested ShutdownGracePeriod

The error message indicates missing kubelet inhibitors and ucp-kubelet errors, due to the current default InhibitDelayMaxSec setting of 30s in the operating system.

You can resolve the issue either by changing the InhibitDelayMaxSec parameter setting to a larger value or by removing it.

The configuration file that contains the InhibitDelayMaxSec parameter setting can be located in any one of a number of locations:

/etc/systemd/logind.conf
/etc/systemd/logind.conf.d/.conf
/run/systemd/logind.conf.d/.conf
/usr/lib/systemd/logind.conf.d/*.conf
/usr/lib/systemd/logind.conf.d/unattended-upgrades-logind-maxdelay.conf

Graceful node drain does not occur and the pods are not terminated¶

Due to the systemd PrepareForShutdown signal not being sent to dbus, in some operating system distributions graceful node drain does not occur and the pods are not terminated.

Currently, in the following cases, the PrepareForShutdown signal is triggered and the Graceful Node Shutdown feature works as intended:

systemctl reboot
systemctl poweroff
shutdown -h
shutdown -h +0
shutdown -h +5