Garbage collection

Garbage collection

You can configure the Mirantis Secure Registry (MSR) to automatically delete unused image layers, thus saving you disk space. This process is also known as garbage collection.

How MSR deletes unused layers

First you configure MSR to run a garbage collection job on a fixed schedule. At the scheduled time, MSR:

  1. Identifies and marks unused image layers.
  2. Deletes the marked image layers.

MSR uses online garbage collection. This allows MSR to run garbage collection without setting MSR to read-only/offline mode. In previous versions, garbage collection would set MSR to read-only/offline mode so MSR would reject pushes. Online garbage collection was an experimental feature in v2.5.

Schedule garbage collection

In your browser, navigate to https://<msr-url> and log in with your credentials. Select System on the left navigation pane, and then click the Garbage collection tab to schedule garbage collection.


Select for how long the garbage collection job should run:

  • Until done: Run the job until all unused image layers are deleted.
  • For x minutes: Only run the garbage collection job for a maximum of x minutes at a time.
  • Never: Never delete unused image layers.

If you select Until done or For x minutes, you can specify a recurring schedule in UTC (Coordinated Universal Time) with the following options:

  • Custom cron schedule - (Hour, Day of Month, Month, Weekday)
  • Daily at midnight UTC
  • Every Saturday at 1am UTC
  • Every Sunday at 1am UTC
  • Do not repeat

Once everything is configured you can choose to Save & Start to run the garbage collection job immediately, or just Save to run the job on the next scheduled interval.

Review the garbage collection job log

In v2.5, you were notified with a banner under main navigation that no one can push images while a garbage collection job is running. With v2.6, this is no longer the case since garbage collection now happens while MSR is online and writable.

If you clicked Save & Start previously, verify that the garbage collection routine started by navigating to Job Logs.


Under the hood

Each image stored in MSR is made up of multiple files:

  • A list of image layers that are unioned which represents the image filesystem
  • A configuration file that contains the architecture of the image and other metadata
  • A manifest file containing the list of all layers and configuration file for an image

All these files are tracked in MSR’s metadata store in RethinkDB. These files are tracked in a content-addressable way such that a file corresponds to a cryptographic hash of the file’s content. This means that if two image tags hold exactly the same content, MSR only stores the image content once while making hash collisions nearly impossible, even if the tag name is different.

As an example, if wordpress:4.8 and wordpress:latest have the same content, the content will only be stored once. If you delete one of these tags, the other won’t be deleted.

This means that when you delete an image tag, MSR cannot delete the underlying files of that image tag since other tags may also use the same files.

To facilitate online garbage collection, MSR makes a couple of changes to how it uses the storage backend:

  1. Layer links – the references within repository directories to their associated blobs – are no longer in the storage backend. That is because MSR stores these references in RethinkDB instead to enumerate through them during the marking phase of garbage collection.
  2. Any layers created after an upgrade to 2.6 are no longer content-addressed in the storage backend. Many cloud provider backends do not give the sequential consistency guarantees required to deal with the simultaneous deleting and re-pushing of a layer in a predictable manner. To account for this, MSR assigns each newly pushed layer a unique ID and performs the translation from content hash to ID in RethinkDB.

To delete unused files, MSR does the following:

  1. Establish a cutoff time.
  2. Mark each referenced manifest file with a timestamp. When manifest files are pushed to MSR, they are also marked with a timestamp.
  3. Sweep each manifest file that does not have a timestamp after the cutoff time.
  4. If a file is never referenced – which means no image tag uses it – delete the file.
  5. Repeat the process for blob links and blob descriptors.

Where to go next