This section instructs you on how to troubleshoot the DriveTrain issues.
[32334] The Glusterd service does not restart automatically after its child processes failed or were unexpectedly killed.
To troubleshoot the issue:
Log in to a KVM node.
In the /lib/systemd/system/glusterd.service
file, set the Restart
option in the [Service]
section:
[Service]
...
Restart=on-abort
...
The recommended values include:
on-abort
The service restarts only if the service process exits due to an uncaught signal not specified as a clean exit status.
on-failure
The service restarts when the process exits with a non-zero exit code, is terminated by a signal including on core dump and excluding the aforementioned four signals, when an operation such as service reload times out, and when the configured watchdog timeout is triggered.
Apply the changes:
systemctl daemon-reload
Note
Re-apply the provided workaround if any of the GlusterFS packages has been re-installed or upgraded.
Perform the steps above on the remaining KVM nodes.
[34848] Jenkins slaves may be unable to connect to Jenkins master during the update of MCP versions prior to 2019.2.4 to MCP maintenance update 2019.2.8 or newer. The issue may be related to the Cross Site Request Forgery (CSRF) protection configuration.
To verify whether your deployment is affected:
From the Salt Master node, obtain the Jenkins credentials:
salt -C 'I@jenkins:client and not I@salt:master' config.get jenkins
Run the following command:
curl -u <jenkins_admin_user>:<jenkins_admin_password> '<jenkins_url>//crumbIssuer/api/xml?xpath=concat(//crumbRequestField,":",//crumb)'
If the system response is 404 Not found
, proceed with the issue
resolution below.
To apply the issue resolution:
Once done, Jenkins slaves automatically reconnect in a few seconds.
[34798] The Deploy - upgrade MCP DriveTrain Jenkins pipeline job may fail with the Error with request: HTTP Error 504: Gateway Time-out error message. The issue may occur in huge environments when applying Salt states on all nodes due to a small NGINX timeout configured on the Salt Master node.
To apply the issue resolution:
Select one of the following options:
In the Deploy - upgrade MCP DriveTrain Jenkins pipeline job, set
the SALT_MASTER_URL parameter to the Salt API endpoint
http://<cfg_node_ip>:6969
.
Increase the NGINX timeout:
Open your Git project repository with the Reclass model on the cluster level.
In classes/cluster/<cluster_name>/infra/config/init.yml
, increase the
NGINX timeout for the Salt API site. By default, the timeout is set to
600 seconds.
parameters:
nginx:
server:
site:
nginx_proxy_salt_api:
proxy:
timeout: <timeout>
Refresh Salt pillars and apply the nginx
state on Salt Master node:
salt -C 'I@salt:master' saltutil.refresh_pillar
salt -C 'I@salt:master' state.apply nginx
Commit the changes to your local repository.
[34114] The Deploy - upgrade MCP DriveTrain Jenkins pipeline job may fail
with the SaltReqTimeoutError in master zmq thread error message when
executing the salt.minion
state on several minions at the same time.
Adjust the Salt Master configuration to improve its performance.
To adjust the Salt Master configuration:
Open your Git project repository with the Reclass model on the cluster level.
In cluster/<cluster_name>/infra/config/init.yml
, increase the values for
the following parameters as required:
gather_job_timeout
The number of seconds to wait for the client to request information about the running jobs.
worker_threads
The number of threads to start to receive commands and replies from minions.
sock_pool_size
The pool size of Unix sockets. To avoid blocking waiting
while writing
data to a socket, a socket pool is supported for Salt applications. For
example, a job or state with a large number of target host list can cause
a long period of blocking waiting
.
zmq_backlog
The number of messages in the ZeroMQ backlog queue.
For example:
parameters:
salt:
master:
worker_threads: 40
opts:
gather_job_timeout: 100
sock_pool_size: 15
zmq_backlog: 3000
Log in to the Salt Master node.
Refresh Salt pillars and apply the salt.master
state.
salt-call saltutil.refresh_pillar
salt-call state.apply salt.master
Commit the changes to your local repository.