MariaDB Pods fail to start after a non-graceful shutdown
After a non-graceful shutdown such as an unexpected power loss or a forced
node reset, a mariadb-server Pod may get stuck in continuous restarts
with the following error in the Pod logs:
[ERROR] Recovery failed! You must enable all engines that were enabled at the moment of the crash
[ERROR] Crash recovery failed. Either correct the problem (if it's, for example, out of memory error) and restart, or delete tc log and start server with --tc-heuristic-recover={commit|rollback}
The error occurs because MariaDB attempts to replay the tc.log transaction
coordinator log during crash recovery, but the wsrep provider registered in
the log at the time of the crash is temporarily disabled during the recovery
phase, preventing the replay from completing.
To resolve the issue:
Create a backup of the
/var/lib/mysqldirectory on the affectedmariadb-serverPod.Verify that other replicas are up and ready.
Remove the
/var/lib/mysql/tc.logfile for the affectedmariadb-serverPod.Remove the affected
mariadb-serverPod or wait until it is automatically restarted.
After Kubernetes restarts the Pod, the Pod rejoins the cluster.