Manage jobs¶
Job queue¶
Mirantis Secure Registry (MSR) uses a job queue to schedule batch jobs. Jobs are added to a cluster-wide job queue, and then consumed and executed by a job runner within MSR.
All MSR replicas have access to the job queue, and have a job runner component that can get and execute work.
How it works¶
When a job is created, it is added to a cluster-wide job queue and
enters the waiting
state. When one of the MSR replicas is ready to
claim the job, it waits a random time of up to 3
seconds to give
every replica the opportunity to claim the task.
A replica claims a job by adding its replica ID to the job. That way,
other replicas will know the job has been claimed. Once a replica claims
a job, it adds that job to an internal queue, which in turn sorts the
jobs by their scheduledAt
time. Once that happens, the replica
updates the job status to running
, and starts executing it.
The job runner component of each MSR replica keeps a
heartbeatExpiration
entry on the database that is shared by all
replicas. If a replica becomes unhealthy, other replicas notice the
change and update the status of the failing worker to dead
. Also,
all the jobs that were claimed by the unhealthy replica enter the
worker_dead
state, so that other replicas can claim the job.
Job types¶
MSR runs periodic and long-running jobs. The following is a complete list of jobs you can filter for via the user interface or the API.
Job |
Description |
---|---|
gc |
A garbage collection job that deletes layers associated with deleted images. |
onlinegc |
A garbage collection job that deletes layers associated with deleted images without putting the registry in read-only mode. |
onlinegc_metadata |
A garbage collection job that deletes metadata associated with deleted images. |
onlinegc_joblogs |
A garbage collection job that deletes job logs based on a configured job history setting. |
metadatastoremigration |
A necessary migration that enables the |
sleep |
Used for testing the correctness of the jobrunner. It sleeps for 60 seconds. |
false |
Used for testing the correctness of the jobrunner. It runs the |
tagmigration |
Used for synchronizing tag and manifest information between the MSR database and the storage backend. |
bloblinkmigration |
A DTR 2.1 to 2.2 upgrade process that adds references for blobs to repositories in the database. |
license_update |
Checks for license expiration extensions if online license updates are enabled. |
scan_check |
An image security scanning job. This job does not perform the actual
scanning, rather it spawns |
scan_check_single |
A security scanning job for a particular layer given by the |
scan_check_all |
A security scanning job that updates all of the currently scanned images to display the latest vulnerabilities. |
update_vuln_db |
A job that is created to update MSR’s vulnerability database. It uses an
Internet connection to check for database updates through
|
scannedlayermigration |
A DTR 2.4 to 2.5 upgrade process that restructures scanned image data. |
push_mirror_tag |
A job that pushes a tag to another registry after a push mirror policy has been evaluated. |
poll_mirror |
A global cron that evaluates poll mirroring policies. |
webhook |
A job that is used to dispatch a webhook payload to a single endpoint. |
nautilus_update_db |
The old name for the |
ro_registry |
A user-initiated job for manually switching MSR into read-only mode. |
tag_pruning |
A job for cleaning up unnecessary or unwanted repository tags which can be configured by repository admins. |
Job status¶
Jobs can have one of the following status values:
Status |
Description |
---|---|
waiting |
Unclaimed job waiting to be picked up by a worker. |
running |
The job is currently being run by the specified |
done |
The job has succesfully completed. |
errors |
The job has completed with errors. |
cancel_request |
The status of a job is monitored by the worker in the database. If the
job status changes to |
cancel |
The job has been canceled and ws not fully executed. |
deleted |
The job and its logs have been removed. |
worker_dead |
The worker for this job has been declared |
worker_shutdown |
The worker that was running this job has been gracefully stopped. |
worker_resurrection |
The worker for this job has reconnected to the databsase and will cancel this job. |
Audit jobs with the web interface¶
As of DTR 2.2, admins were able to view and audit jobs within the software using the API. MSR 2.6 enhances those capabilities by adding a Job Logs tab under System settings on the user interface. The tab displays a sortable and paginated list of jobs along with links to associated job logs.
Prerequisite¶
Job Queue
View jobs list¶
To view the list of jobs within MSR, do the following:
Navigate to
https://<msr-url>
and log in with your MKE credentials.Select System from the left-side navigation panel, and then click Job Logs. You should see a paginated list of past, running, and queued jobs. By default, Job Logs shows the latest
10
jobs on the first page.Specify a filtering option. Job Logs lets you filter by:
Action
Worker ID (the ID of the worker in a MSR replica that is responsible for running the job)
Optional: Click Edit Settings on the right of the filtering options to update your Job Logs settings.
Job details¶
The following is an explanation of the job-related fields displayed in
Job Logs and uses the filtered online_gc
action from above.
Job Detail |
Description |
Example |
---|---|---|
Action |
The type of action or job being performed. |
|
ID |
The ID of the job. |
|
Worker |
The ID of the worker node responsible for ruinning the job. |
|
Status |
Current status of the action or job. |
|
Start Time |
Time when the job started. |
|
Last updated |
Time when the job was last updated. |
|
View Logs |
Links to the full logs for the job. |
|
View job-specific logs¶
To view the log details for a specific job, do the following:
Click View Logs next to the job’s Last Updated value. You will be redirected to the log detail page of your selected job.
Notice how the job
ID
is reflected in the URL while theAction
and the abbreviated form of the jobID
are reflected in the heading. Also, the JSON lines displayed are job-specific MSR container logs.Enter or select a different line count to truncate the number of lines displayed. Lines are cut off from the end of the logs.
Audit jobs with the API¶
Overview¶
This covers troubleshooting batch jobs via the API and was introduced in DTR 2.2. Starting in MSR 2.6, admins have the ability to audit jobs using the web interface.
Prerequisite¶
Job Queue
Job capacity¶
Each job runner has a limited capacity and will not claim jobs that
require a higher capacity. You can see the capacity of a job runner via
the GET /api/v0/workers
endpoint:
{
"workers": [
{
"id": "000000000000",
"status": "running",
"capacityMap": {
"scan": 1,
"scanCheck": 1
},
"heartbeatExpiration": "2017-02-18T00:51:02Z"
}
]
}
This means that the worker with replica ID 000000000000
has a
capacity of 1 scan
and 1 scanCheck
. Next, review the list of
available jobs:
{
"jobs": [
{
"id": "0",
"workerID": "",
"status": "waiting",
"capacityMap": {
"scan": 1
}
},
{
"id": "1",
"workerID": "",
"status": "waiting",
"capacityMap": {
"scan": 1
}
},
{
"id": "2",
"workerID": "",
"status": "waiting",
"capacityMap": {
"scanCheck": 1
}
}
]
}
If worker 000000000000
notices the jobs in waiting
state above,
then it will be able to pick up jobs 0
and 2
since it has the
capacity for both. Job 1
will have to wait until the previous scan
job, 0
, is completed. The job queue will then look like:
{
"jobs": [
{
"id": "0",
"workerID": "000000000000",
"status": "running",
"capacityMap": {
"scan": 1
}
},
{
"id": "1",
"workerID": "",
"status": "waiting",
"capacityMap": {
"scan": 1
}
},
{
"id": "2",
"workerID": "000000000000",
"status": "running",
"capacityMap": {
"scanCheck": 1
}
}
]
}
You can get a list of jobs via the GET /api/v0/jobs/
endpoint. Each
job looks like:
{
"id": "1fcf4c0f-ff3b-471a-8839-5dcb631b2f7b",
"retryFromID": "1fcf4c0f-ff3b-471a-8839-5dcb631b2f7b",
"workerID": "000000000000",
"status": "done",
"scheduledAt": "2017-02-17T01:09:47.771Z",
"lastUpdated": "2017-02-17T01:10:14.117Z",
"action": "scan_check_single",
"retriesLeft": 0,
"retriesTotal": 0,
"capacityMap": {
"scan": 1
},
"parameters": {
"SHA256SUM": "1bacd3c8ccb1f15609a10bd4a403831d0ec0b354438ddbf644c95c5d54f8eb13"
},
"deadline": "",
"stopTimeout": ""
}
The JSON fields of interest here are:
id
: The ID of the jobworkerID
: The ID of the worker in a MSR replica that is running this jobstatus
: The current state of the jobaction
: The type of job the worker will actually performcapacityMap
: The available capacity a worker needs for this job to run
Cron jobs¶
Several of the jobs performed by MSR are run in a recurrent schedule.
You can see those jobs using the GET /api/v0/crons
endpoint:
{
"crons": [
{
"id": "48875b1b-5006-48f5-9f3c-af9fbdd82255",
"action": "license_update",
"schedule": "57 54 3 * * *",
"retries": 2,
"capacityMap": null,
"parameters": null,
"deadline": "",
"stopTimeout": "",
"nextRun": "2017-02-22T03:54:57Z"
},
{
"id": "b1c1e61e-1e74-4677-8e4a-2a7dacefffdc",
"action": "update_db",
"schedule": "0 0 3 * * *",
"retries": 0,
"capacityMap": null,
"parameters": null,
"deadline": "",
"stopTimeout": "",
"nextRun": "2017-02-22T03:00:00Z"
}
]
}
The schedule
field uses a cron expression following the
(seconds) (minutes) (hours) (day of month) (month) (day of week)
format. For example, 57 54 3 * * *
with cron ID
48875b1b-5006-48f5-9f3c-af9fbdd82255
will be run at 03:54:57
on
any day of the week or the month, which is 2017-02-22T03:54:57Z
in
the example JSON response above.
Enable auto-deletion of job logs¶
Mirantis Secure Registry has a global setting for auto-deletion of job logs which allows them to be removed as part of garbage collection. MSR admins can enable auto-deletion of repository events in MSR 2.6 based on specified conditions which are covered below.
In your browser, navigate to
https://<msr-url>
and log in with your MKE credentials.Select System on the left-side navigation panel, which will display the Settings page by default.
Scroll down to Job Logs and turn on Auto-Deletion.
Specify the conditions with which a job log auto-deletion will be triggered.
MSR allows you to set your auto-deletion conditions based on the following optional job log attributes:
Name
Description
Example
Age
Lets you remove job logs which are older than your specified number of hours, days, weeks or months
2 months
Max number of events
Lets you specify the maximum number of job logs allowed within MSR.
100
If you check and specify both, job logs will be removed from MSR during garbage collection if either condition is met. You should see a confirmation message right away.
Click Start Deletion if you’re ready. Read more about configure-garbage-collection> if you’re unsure about this operation.
Navigate to System > Job Logs to confirm that
onlinegc_joblogs
has started.
Note
When you enable auto-deletion of job logs, the logs will be permanently deleted during garbage collection.