How garbage collection works

In conducting garbage collection, MSR performs the following actions in sequence:

  1. Establishes a cutoff time.

  2. Marks each referenced manifest file with a timestamp. When manifest files are pushed to MSR, they are also marked with a timestamp.

  3. Sweeps each manifest file that does not have a timestamp after the cutoff time.

  4. Deletes the file if it is never referenced, meaning that no image tag uses it.

  5. Repeats the process for blob links and blob descriptors.

Each image stored in MSR is comprised of the following files:

  • The image filesystem, which consists of a list of unioned image layers.

  • A configuration file, which contains the architecture of the image along with other metadata.

  • A manifest file, which contains a list of all the image layers and the configuration file for the image.

MSR tracks these files in its metadata store, using RethinkDB, doing so in a content-addressable manner in which each file corresponds to a cryptographic hash of the file content. Thus, if two image tags hold exactly the same content, MSR only stores the content once, which makes hash collisions nearly impossible even when image tag names differ. For example, if wordpress:4.8 and wordpress:latest have the same content, MSR will only store that content once. If you delete one of these tags, the other will remain intact.

As a result, when you delete an image tag, MSR cannot delete the underlying files as it is possible that other tags also use the same underlying files.