Manage jobs¶

Job queue¶

Mirantis Secure Registry (MSR) uses a job queue to schedule batch jobs. Jobs are added to a cluster-wide job queue, and then consumed and executed by a job runner within MSR.

All MSR replicas have access to the job queue, and have a job runner component that can get and execute work.

How it works¶

When a job is created, it is added to a cluster-wide job queue and enters the waiting state. When one of the MSR replicas is ready to claim the job, it waits a random time of up to 3 seconds to give every replica the opportunity to claim the task.

A replica claims a job by adding its replica ID to the job. That way, other replicas will know the job has been claimed. Once a replica claims a job, it adds that job to an internal queue, which in turn sorts the jobs by their scheduledAt time. Once that happens, the replica updates the job status to running, and starts executing it.

The job runner component of each MSR replica keeps a heartbeatExpiration entry on the database that is shared by all replicas. If a replica becomes unhealthy, other replicas notice the change and update the status of the failing worker to dead. Also, all the jobs that were claimed by the unhealthy replica enter the worker_dead state, so that other replicas can claim the job.

Job types¶

MSR runs periodic and long-running jobs. The following is a complete list of jobs you can filter for via the user interface or the API.

Job types¶
Job	Description
gc	A garbage collection job that deletes layers associated with deleted images.
onlinegc	A garbage collection job that deletes layers associated with deleted images without putting the registry in read-only mode.
onlinegc_metadata	A garbage collection job that deletes metadata associated with deleted images.
onlinegc_joblogs	A garbage collection job that deletes job logs based on a configured job history setting.
metadatastoremigration	A necessary migration that enables the `onlinegc` feature.
sleep	Used for testing the correctness of the jobrunner. It sleeps for 60 seconds.
false	Used for testing the correctness of the jobrunner. It runs the `false` command and immediately fails.
tagmigration	Used for synchronizing tag and manifest information between the MSR database and the storage backend.
bloblinkmigration	A DTR 2.1 to 2.2 upgrade process that adds references for blobs to repositories in the database.
license_update	Checks for license expiration extensions if online license updates are enabled.
scan_check	An image security scanning job. This job does not perform the actual scanning, rather it spawns `scan_check_single` jobs (one for each layer in the image). Once all of the `scan_check_single` jobs are complete, this job will terminate.
scan_check_single	A security scanning job for a particular layer given by the `parameter: SHA256SUM`. This job breaks up the layer into components and checks each component for vulnerabilities.
scan_check_all	A security scanning job that updates all of the currently scanned images to display the latest vulnerabilities.
update_vuln_db	A job that is created to update MSR’s vulnerability database. It uses an Internet connection to check for database updates through `https://dss-cve-updates.docker.com/` and updates the `dtr-scanningstore` container if there is a new update available.
scannedlayermigration	A DTR 2.4 to 2.5 upgrade process that restructures scanned image data.
push_mirror_tag	A job that pushes a tag to another registry after a push mirror policy has been evaluated.
poll_mirror	A global cron that evaluates poll mirroring policies.
webhook	A job that is used to dispatch a webhook payload to a single endpoint.
nautilus_update_db	The old name for the `update_vuln_db` job. This may be visible on old log files.
ro_registry	A user-initiated job for manually switching MSR into read-only mode.
tag_pruning	A job for cleaning up unnecessary or unwanted repository tags which can be configured by repository admins.

Job status¶

Jobs can have one of the following status values:

Job values¶
Status	Description
waiting	Unclaimed job waiting to be picked up by a worker.
running	The job is currently being run by the specified `workerID`.
done	The job has successfully completed.
errors	The job has completed with errors.
cancel_request	The status of a job is monitored by the worker in the database. If the job status changes to `cancel_request`, the job is canceled by the worker.
cancel	The job has been canceled and ws not fully executed.
deleted	The job and its logs have been removed.
worker_dead	The worker for this job has been declared `dead` and the job will not continue.
worker_shutdown	The worker that was running this job has been gracefully stopped.
worker_resurrection	The worker for this job has reconnected to the databases and will cancel this job.

Audit jobs with the web interface¶

As of DTR 2.2, admins were able to view and audit jobs within the software using the API. MSR 2.6 enhances those capabilities by adding a Job Logs tab under System settings on the user interface. The tab displays a sortable and paginated list of jobs along with links to associated job logs.

Prerequisite¶

Job Queue

View jobs list¶

To view the list of jobs within MSR, do the following:

Navigate to https://<msr-url> and log in with your MKE credentials.
Select System from the left-side navigation panel, and then click Job Logs. You should see a paginated list of past, running, and queued jobs. By default, Job Logs shows the latest 10 jobs on the first page.
Specify a filtering option. Job Logs lets you filter by:
- Action
- Worker ID (the ID of the worker in an MSR replica that is responsible for running the job)
Optional: Click Edit Settings on the right of the filtering options to update your Job Logs settings.

Job details¶

The following is an explanation of the job-related fields displayed in Job Logs and uses the filtered online_gc action from above.

Jobs values¶
Job Detail	Description	Example
Action	The type of action or job being performed.	`onlinegc`
ID	The ID of the job.	`ccc05646-569a-4ac4-b8e1-113111f63fb9`
Worker	The ID of the worker node responsible for running the job.	`8f553c8b697c`
Status	Current status of the action or job.	`done`
Start Time	Time when the job started.	`9/23/2018 7:04 PM`
Last updated	Time when the job was last updated.	`9/23/2018 7:04 PM`
View Logs	Links to the full logs for the job.	`[View Logs]`

View job-specific logs¶

To view the log details for a specific job, do the following:

Click View Logs next to the job’s Last Updated value. You will be redirected to the log detail page of your selected job.

Notice how the job ID is reflected in the URL while the Action and the abbreviated form of the job ID are reflected in the heading. Also, the JSON lines displayed are job-specific MSR container logs.
Enter or select a different line count to truncate the number of lines displayed. Lines are cut off from the end of the logs.

Audit jobs with the API¶

Overview¶

This covers troubleshooting batch jobs via the API and was introduced in DTR 2.2. Starting in MSR 2.6, admins have the ability to audit jobs using the web interface.

Prerequisite¶

Job Queue

Job capacity¶

Each job runner has a limited capacity and will not claim jobs that require a higher capacity. You can see the capacity of a job runner via the GET /api/v0/workers endpoint:

{
  "workers": [
    {
      "id": "000000000000",
      "status": "running",
      "capacityMap": {
        "scan": 1,
        "scanCheck": 1
      },
      "heartbeatExpiration": "2017-02-18T00:51:02Z"
    }
  ]
}

This means that the worker with replica ID 000000000000 has a capacity of 1 scan and 1 scanCheck. Next, review the list of available jobs:

{
  "jobs": [
    {
      "id": "0",
      "workerID": "",
      "status": "waiting",
      "capacityMap": {
        "scan": 1
      }
    },
    {
       "id": "1",
       "workerID": "",
       "status": "waiting",
       "capacityMap": {
         "scan": 1
       }
    },
    {
     "id": "2",
      "workerID": "",
      "status": "waiting",
      "capacityMap": {
        "scanCheck": 1
      }
    }
  ]
}

If worker 000000000000 notices the jobs in waiting state above, then it will be able to pick up jobs 0 and 2 since it has the capacity for both. Job 1 will have to wait until the previous scan job, 0, is completed. The job queue will then look like:

{
  "jobs": [
    {
      "id": "0",
      "workerID": "000000000000",
      "status": "running",
      "capacityMap": {
        "scan": 1
      }
    },
    {
       "id": "1",
       "workerID": "",
       "status": "waiting",
       "capacityMap": {
         "scan": 1
       }
    },
    {
     "id": "2",
      "workerID": "000000000000",
      "status": "running",
      "capacityMap": {
        "scanCheck": 1
      }
    }
  ]
}

You can get a list of jobs via the GET /api/v0/jobs/ endpoint. Each job looks like:

{
    "id": "1fcf4c0f-ff3b-471a-8839-5dcb631b2f7b",
    "retryFromID": "1fcf4c0f-ff3b-471a-8839-5dcb631b2f7b",
    "workerID": "000000000000",
    "status": "done",
    "scheduledAt": "2017-02-17T01:09:47.771Z",
    "lastUpdated": "2017-02-17T01:10:14.117Z",
    "action": "scan_check_single",
    "retriesLeft": 0,
    "retriesTotal": 0,
    "capacityMap": {
          "scan": 1
    },
    "parameters": {
          "SHA256SUM": "1bacd3c8ccb1f15609a10bd4a403831d0ec0b354438ddbf644c95c5d54f8eb13"
    },
    "deadline": "",
    "stopTimeout": ""
}

The JSON fields of interest here are:

id: The ID of the job
workerID: The ID of the worker in an MSR replica that is running this job
status: The current state of the job
action: The type of job the worker will actually perform
capacityMap: The available capacity a worker needs for this job to run

Cron jobs¶

Several of the jobs performed by MSR are run in a recurrent schedule. You can see those jobs using the GET /api/v0/crons endpoint:

{
  "crons": [
    {
      "id": "48875b1b-5006-48f5-9f3c-af9fbdd82255",
      "action": "license_update",
      "schedule": "57 54 3 * * *",
      "retries": 2,
      "capacityMap": null,
      "parameters": null,
      "deadline": "",
      "stopTimeout": "",
      "nextRun": "2017-02-22T03:54:57Z"
    },
    {
      "id": "b1c1e61e-1e74-4677-8e4a-2a7dacefffdc",
      "action": "update_db",
      "schedule": "0 0 3 * * *",
      "retries": 0,
      "capacityMap": null,
      "parameters": null,
      "deadline": "",
      "stopTimeout": "",
      "nextRun": "2017-02-22T03:00:00Z"
    }
  ]
}

The schedule field uses a cron expression following the (seconds) (minutes) (hours) (day of month) (month) (day of week) format. For example, 57 54 3 * * * with cron ID 48875b1b-5006-48f5-9f3c-af9fbdd82255 will be run at 03:54:57 on any day of the week or the month, which is 2017-02-22T03:54:57Z in the example JSON response above.

Enable auto-deletion of job logs¶

Mirantis Secure Registry has a global setting for auto-deletion of job logs which allows them to be removed as part of garbage collection. MSR admins can enable auto-deletion of repository events in MSR 2.6 based on specified conditions which are covered below.

In your browser, navigate to https://<msr-url> and log in with your MKE credentials.
Select System on the left-side navigation panel, which will display the Settings page by default.
Scroll down to Job Logs and turn on Auto-Deletion.

Specify the conditions with which a job log auto-deletion will be triggered.

MSR allows you to set your auto-deletion conditions based on the following optional job log attributes:

Name	Description	Example
Age	Lets you remove job logs which are older than your specified number of hours, days, weeks or months	`2 months`
Max number of events	Lets you specify the maximum number of job logs allowed within MSR.	`100`

If you check and specify both, job logs will be removed from MSR during garbage collection if either condition is met. You should see a confirmation message right away.

Click Start Deletion if you’re ready. Read more about Garbage collection if you’re unsure about this operation.
Navigate to System > Job Logs to confirm that onlinegc_joblogs has started.

Note

When you enable auto-deletion of job logs, the logs will be permanently deleted during garbage collection.