Manage jobs

Job queue

Mirantis Secure Registry (MSR) uses a job queue to schedule batch jobs. Jobs are added to a cluster-wide job queue, and then consumed and executed by a job runner within MSR.

All MSR replicas have access to the job queue, and have a job runner component that can get and execute work.

How it works

When a job is created, it is added to a cluster-wide job queue and enters the waiting state. When one of the MSR replicas is ready to claim the job, it waits a random time of up to 3 seconds to give every replica the opportunity to claim the task.

A replica claims a job by adding its replica ID to the job. That way, other replicas will know the job has been claimed. Once a replica claims a job, it adds that job to an internal queue, which in turn sorts the jobs by their scheduledAt time. Once that happens, the replica updates the job status to running, and starts executing it.

The job runner component of each MSR replica keeps a heartbeatExpiration entry on the database that is shared by all replicas. If a replica becomes unhealthy, other replicas notice the change and update the status of the failing worker to dead. Also, all the jobs that were claimed by the unhealthy replica enter the worker_dead state, so that other replicas can claim the job.

Job types

MSR runs periodic and long-running jobs. The following is a complete list of jobs you can filter for via the user interface or the API.

Job types

Job

Description

gc

A garbage collection job that deletes layers associated with deleted images.

onlinegc

A garbage collection job that deletes layers associated with deleted images without putting the registry in read-only mode.

onlinegc_metadata

A garbage collection job that deletes metadata associated with deleted images.

onlinegc_joblogs

A garbage collection job that deletes job logs based on a configured job history setting.

metadatastoremigration

A necessary migration that enables the onlinegc feature.

sleep

Used for testing the correctness of the jobrunner. It sleeps for 60 seconds.

false

Used for testing the correctness of the jobrunner. It runs the false command and immediately fails.

tagmigration

Used for synchronizing tag and manifest information between the MSR database and the storage backend.

bloblinkmigration

A DTR 2.1 to 2.2 upgrade process that adds references for blobs to repositories in the database.

license_update

Checks for license expiration extensions if online license updates are enabled.

scan_check

An image security scanning job. This job does not perform the actual scanning, rather it spawns scan_check_single jobs (one for each layer in the image). Once all of the scan_check_single jobs are complete, this job will terminate.

scan_check_single

A security scanning job for a particular layer given by the parameter: SHA256SUM. This job breaks up the layer into components and checks each component for vulnerabilities.

scan_check_all

A security scanning job that updates all of the currently scanned images to display the latest vulnerabilities.

update_vuln_db

A job that is created to update MSR’s vulnerability database. It uses an Internet connection to check for database updates through https://dss-cve-updates.docker.com/ and updates the dtr-scanningstore container if there is a new update available.

scannedlayermigration

A DTR 2.4 to 2.5 upgrade process that restructures scanned image data.

push_mirror_tag

A job that pushes a tag to another registry after a push mirror policy has been evaluated.

poll_mirror

A global cron that evaluates poll mirroring policies.

webhook

A job that is used to dispatch a webhook payload to a single endpoint.

nautilus_update_db

The old name for the update_vuln_db job. This may be visible on old log files.

ro_registry

A user-initiated job for manually switching MSR into read-only mode.

tag_pruning

A job for cleaning up unnecessary or unwanted repository tags which can be configured by repository admins.

Job status

Jobs can have one of the following status values:

Job values

Status

Description

waiting

Unclaimed job waiting to be picked up by a worker.

running

The job is currently being run by the specified workerID.

done

The job has succesfully completed.

errors

The job has completed with errors.

cancel_request

The status of a job is monitored by the worker in the database. If the job status changes to cancel_request, the job is canceled by the worker.

cancel

The job has been canceled and ws not fully executed.

deleted

The job and its logs have been removed.

worker_dead

The worker for this job has been declared dead and the job will not continue.

worker_shutdown

The worker that was running this job has been gracefully stopped.

worker_resurrection

The worker for this job has reconnected to the databsase and will cancel this job.

Audit jobs with the web interface

As of DTR 2.2, admins were able to view and audit jobs within the software using the API. MSR 2.6 enhances those capabilities by adding a Job Logs tab under System settings on the user interface. The tab displays a sortable and paginated list of jobs along with links to associated job logs.

Prerequisite

  • Job Queue

View jobs list

To view the list of jobs within MSR, do the following:

  1. Navigate to https://<msr-url> and log in with your MKE credentials.

  2. Select System from the left-side navigation panel, and then click Job Logs. You should see a paginated list of past, running, and queued jobs. By default, Job Logs shows the latest 10 jobs on the first page.

  3. Specify a filtering option. Job Logs lets you filter by:

    • Action

    • Worker ID (the ID of the worker in a MSR replica that is responsible for running the job)

  4. Optional: Click Edit Settings on the right of the filtering options to update your Job Logs settings.

Job details

The following is an explanation of the job-related fields displayed in Job Logs and uses the filtered online_gc action from above.

Jobs values

Job Detail

Description

Example

Action

The type of action or job being performed.

onlinegc

ID

The ID of the job.

ccc05646-569a-4ac4-b8e1-113111f63fb9

Worker

The ID of the worker node responsible for ruinning the job.

8f553c8b697c

Status

Current status of the action or job.

done

Start Time

Time when the job started.

9/23/2018 7:04 PM

Last updated

Time when the job was last updated.

9/23/2018 7:04 PM

View Logs

Links to the full logs for the job.

[View Logs]

View job-specific logs

To view the log details for a specific job, do the following:

  1. Click View Logs next to the job’s Last Updated value. You will be redirected to the log detail page of your selected job.

    Notice how the job ID is reflected in the URL while the Action and the abbreviated form of the job ID are reflected in the heading. Also, the JSON lines displayed are job-specific MSR container logs.

  2. Enter or select a different line count to truncate the number of lines displayed. Lines are cut off from the end of the logs.

Audit jobs with the API

Overview

This covers troubleshooting batch jobs via the API and was introduced in DTR 2.2. Starting in MSR 2.6, admins have the ability to audit jobs using the web interface.

Prerequisite

  • Job Queue

Job capacity

Each job runner has a limited capacity and will not claim jobs that require a higher capacity. You can see the capacity of a job runner via the GET /api/v0/workers endpoint:

{
  "workers": [
    {
      "id": "000000000000",
      "status": "running",
      "capacityMap": {
        "scan": 1,
        "scanCheck": 1
      },
      "heartbeatExpiration": "2017-02-18T00:51:02Z"
    }
  ]
}

This means that the worker with replica ID 000000000000 has a capacity of 1 scan and 1 scanCheck. Next, review the list of available jobs:

{
  "jobs": [
    {
      "id": "0",
      "workerID": "",
      "status": "waiting",
      "capacityMap": {
        "scan": 1
      }
    },
    {
       "id": "1",
       "workerID": "",
       "status": "waiting",
       "capacityMap": {
         "scan": 1
       }
    },
    {
     "id": "2",
      "workerID": "",
      "status": "waiting",
      "capacityMap": {
        "scanCheck": 1
      }
    }
  ]
}

If worker 000000000000 notices the jobs in waiting state above, then it will be able to pick up jobs 0 and 2 since it has the capacity for both. Job 1 will have to wait until the previous scan job, 0, is completed. The job queue will then look like:

{
  "jobs": [
    {
      "id": "0",
      "workerID": "000000000000",
      "status": "running",
      "capacityMap": {
        "scan": 1
      }
    },
    {
       "id": "1",
       "workerID": "",
       "status": "waiting",
       "capacityMap": {
         "scan": 1
       }
    },
    {
     "id": "2",
      "workerID": "000000000000",
      "status": "running",
      "capacityMap": {
        "scanCheck": 1
      }
    }
  ]
}

You can get a list of jobs via the GET /api/v0/jobs/ endpoint. Each job looks like:

{
    "id": "1fcf4c0f-ff3b-471a-8839-5dcb631b2f7b",
    "retryFromID": "1fcf4c0f-ff3b-471a-8839-5dcb631b2f7b",
    "workerID": "000000000000",
    "status": "done",
    "scheduledAt": "2017-02-17T01:09:47.771Z",
    "lastUpdated": "2017-02-17T01:10:14.117Z",
    "action": "scan_check_single",
    "retriesLeft": 0,
    "retriesTotal": 0,
    "capacityMap": {
          "scan": 1
    },
    "parameters": {
          "SHA256SUM": "1bacd3c8ccb1f15609a10bd4a403831d0ec0b354438ddbf644c95c5d54f8eb13"
    },
    "deadline": "",
    "stopTimeout": ""
}

The JSON fields of interest here are:

  • id: The ID of the job

  • workerID: The ID of the worker in a MSR replica that is running this job

  • status: The current state of the job

  • action: The type of job the worker will actually perform

  • capacityMap: The available capacity a worker needs for this job to run

Cron jobs

Several of the jobs performed by MSR are run in a recurrent schedule. You can see those jobs using the GET /api/v0/crons endpoint:

{
  "crons": [
    {
      "id": "48875b1b-5006-48f5-9f3c-af9fbdd82255",
      "action": "license_update",
      "schedule": "57 54 3 * * *",
      "retries": 2,
      "capacityMap": null,
      "parameters": null,
      "deadline": "",
      "stopTimeout": "",
      "nextRun": "2017-02-22T03:54:57Z"
    },
    {
      "id": "b1c1e61e-1e74-4677-8e4a-2a7dacefffdc",
      "action": "update_db",
      "schedule": "0 0 3 * * *",
      "retries": 0,
      "capacityMap": null,
      "parameters": null,
      "deadline": "",
      "stopTimeout": "",
      "nextRun": "2017-02-22T03:00:00Z"
    }
  ]
}

The schedule field uses a cron expression following the (seconds) (minutes) (hours) (day of month) (month) (day of week) format. For example, 57 54 3 * * * with cron ID 48875b1b-5006-48f5-9f3c-af9fbdd82255 will be run at 03:54:57 on any day of the week or the month, which is 2017-02-22T03:54:57Z in the example JSON response above.

Enable auto-deletion of job logs

Mirantis Secure Registry has a global setting for auto-deletion of job logs which allows them to be removed as part of garbage collection. MSR admins can enable auto-deletion of repository events in MSR 2.6 based on specified conditions which are covered below.

  1. In your browser, navigate to https://<msr-url> and log in with your MKE credentials.

  2. Select System on the left-side navigation panel, which will display the Settings page by default.

  3. Scroll down to Job Logs and turn on Auto-Deletion.

  4. Specify the conditions with which a job log auto-deletion will be triggered.

    MSR allows you to set your auto-deletion conditions based on the following optional job log attributes:

    Name

    Description

    Example

    Age

    Lets you remove job logs which are older than your specified number of hours, days, weeks or months

    2 months

    Max number of events

    Lets you specify the maximum number of job logs allowed within MSR.

    100

    If you check and specify both, job logs will be removed from MSR during garbage collection if either condition is met. You should see a confirmation message right away.

  5. Click Start Deletion if you’re ready. Read more about configure-garbage-collection> if you’re unsure about this operation.

  6. Navigate to System > Job Logs to confirm that onlinegc_joblogs has started.

Note

When you enable auto-deletion of job logs, the logs will be permanently deleted during garbage collection.