KaaSCephOperationRequest OSD removal status

This section describes the status.osdRemoveStatus.removeInfo fields of the KaaSCephOperationRequest CR that you can use to review a Ceph OSD or node removal phases. The following diagram represents the phases flow:

../../../../_images/ceph-osd-remove-phases-flow.png
KaaSCephOperationRequest high-level parameters status

Parameter

Description

osdRemoveStatus

Describes the status of the current CephOsdRemoveRequest. For details, see KaaSCephOperationRequest ‘osdRemoveStatus’ parameters status.

childNodesMapping

The key-value mapping that reflects the management cluster machine names with their corresponding Kubernetes node names.

KaaSCephOperationRequest ‘osdRemoveStatus’ parameters status

Parameter

Description

phase

Describes the current request phase that can be one of:

  • Pending - the request is created and placed in the request queue.

  • Validation - the request is taken from the queue and the provided information is being validated.

  • ApproveWaiting - the request passed the validation phase, is ready to execute, and is waiting for user confirmation through the approve flag.

  • Processing - the request is executing following the next phases:

    • Pending - marking the current Ceph OSD for removal.

    • Rebalancing - the Ceph OSD is moved out, waiting until it is rebalanced. If the current Ceph OSD is down or already out, the next phase takes place.

    • Removing - purging the Ceph OSD and its authorization key.

    • Removed - the Ceph OSD has been successfully removed.

    • Failed - the Ceph OSD failed to remove.

  • Completed - the request executed with no issues.

  • CompletedWithWarnings - the request executed with non-critical issues. Review the output, action may be required.

  • InputWaiting - during the Validation or Processing phases, critical issues occurred that require attention. If issues occurred during validation, update osdRemove information, if present, and re-run validation. If issues occurred during processing, review the reported issues and manually resolve them.

  • Failed - the request failed during the Validation or Processing phases.

removeInfo

The overall information about the Ceph OSDs to remove: final removal map, issues, and warnings. Once the Processing phase succeeds, removeInfo will be extended with the removal status for each node and Ceph OSD. In case of an entire node removal, the status will contain the status itself and an error message, if any.

The removeInfo.osdMapping field contains information about:

  • Ceph OSDs removal status.

  • Batch job reference for the device cleanup: its name, status, and error, if any. The batch job status for the device cleanup will be either Failed, Completed, or Skipped. The Skipped status is used when a host is down, disk is crashed, or an error occurred when obtaining the ceph-volume information.

  • Ceph OSD deployment removal status and the related Ceph OSD name. The status will be either Failed or Removed.

messages

Informational messages describing the reason for the request transition to the next phase.

conditions

History of spec updates for the request.

Example of status.osdRemoveStatus.removeInfo after successful Validation
removeInfo:
  cleanUpMap:
    "node-a":
      completeCleanUp: true
      osdMapping:
        "2":
          deviceMapping:
            "sdb":
              path: "/dev/disk/by-path/pci-0000:00:0a.0"
              partition: "/dev/ceph-a-vg_sdb/osd-block-b-lv_sdb"
              type: "block"
              class: "hdd"
              zapDisk: true
        "6":
          deviceMapping:
            "sdc":
              path: "/dev/disk/by-path/pci-0000:00:0c.0"
              partition: "/dev/ceph-a-vg_sdc/osd-block-b-lv_sdc-1"
              type: "block"
              class: "hdd"
              zapDisk: true
        "11":
          deviceMapping:
            "sdc":
              path: "/dev/disk/by-path/pci-0000:00:0c.0"
              partition: "/dev/ceph-a-vg_sdc/osd-block-b-lv_sdc-2"
              type: "block"
              class: "hdd"
              zapDisk: true
    "node-b":
      osdMapping:
        "1":
          deviceMapping:
            "sdb":
              path: "/dev/disk/by-path/pci-0000:00:0a.0"
              partition: "/dev/ceph-b-vg_sdb/osd-block-b-lv_sdb"
              type: "block"
              class: "ssd"
              zapDisk: true
        "15":
          deviceMapping:
            "sdc":
              path: "/dev/disk/by-path/pci-0000:00:0b.1"
              partition: "/dev/ceph-b-vg_sdc/osd-block-b-lv_sdc"
              type: "block"
              class: "ssd"
              zapDisk: true
        "25":
          deviceMapping:
            "sdd":
              path: "/dev/disk/by-path/pci-0000:00:0c.2"
              partition: "/dev/ceph-b-vg_sdd/osd-block-b-lv_sdd"
              type: "block"
              class: "ssd"
              zapDisk: true
    "node-c":
      osdMapping:
        "0":
          deviceMapping:
            "sdb":
              path: "/dev/disk/by-path/pci-0000:00:1t.9"
              partition: "/dev/ceph-c-vg_sdb/osd-block-c-lv_sdb"
              type: "block"
              class: "hdd"
              zapDisk: true
        "8":
          deviceMapping:
            "sde":
              path: "/dev/disk/by-path/pci-0000:00:1c.5"
              partition: "/dev/ceph-c-vg_sde/osd-block-c-lv_sde"
              type: "block"
              class: "hdd"
              zapDisk: true
            "sdf":
              path: "/dev/disk/by-path/pci-0000:00:5a.5",
              partition: "/dev/ceph-c-vg_sdf/osd-db-c-lv_sdf-1",
              type: "db",
              class: "ssd"

The example above is based on the example spec provided in KaaSCephOperationRequest OSD removal specification. During the Validation phase, the provided information was validated and reflects the final map of the Ceph OSDs to remove:

  • For node-a, Ceph OSDs with IDs 2, 6, and 11 will be removed with the related disk and its information: all block devices, names, paths, and disk class.

  • For node-b, the Ceph OSDs with IDs 1, 15, and 25 will be removed with the related disk information.

  • For node-c, the Ceph OSD with ID 8 will be removed, which is placed on the specified sdb device. The related partition on the sdf disk, which is used as the BlueStore metadata device, will be cleaned up keeping the disk itself untouched. Other partitions on that device will not be touched.

Example of removeInfo with removeStatus succeeded
removeInfo:
  cleanUpMap:
    "node-a":
      completeCleanUp: true
      hostRemoveStatus:
        status: Removed
      osdMapping:
        "2":
          removeStatus:
            osdRemoveStatus:
              status: Removed
            deploymentRemoveStatus:
              status: Removed
              name: "rook-ceph-osd-2"
            deviceCleanUpJob:
              status: Finished
              name: "job-name-for-osd-2"
          deviceMapping:
            "sdb":
              path: "/dev/disk/by-path/pci-0000:00:0a.0"
              partition: "/dev/ceph-a-vg_sdb/osd-block-b-lv_sdb"
              type: "block"
              class: "hdd"
              zapDisk: true
Example of removeInfo with removeStatus failed
removeInfo:
  cleanUpMap:
    "node-a":
      completeCleanUp: true
      osdMapping:
        "2":
          removeStatus:
            osdRemoveStatus:
              errorReason: "retries for cmd ‘ceph osd ok-to-stop 2’ exceeded"
              status: Failed
          deviceMapping:
            "sdb":
              path: "/dev/disk/by-path/pci-0000:00:0a.0"
              partition: "/dev/ceph-a-vg_sdb/osd-block-b-lv_sdb"
              type: "block"
              class: "hdd"
              zapDisk: true
Example of removeInfo with removeStatus failed by timeout
removeInfo:
  cleanUpMap:
    "node-a":
      completeCleanUp: true
      osdMapping:
        "2":
          removeStatus:
            osdRemoveStatus:
              errorReason: Timeout (30m0s) reached for waiting pg rebalance for osd 2
              status: Failed
          deviceMapping:
            "sdb":
              path: "/dev/disk/by-path/pci-0000:00:0a.0"
              partition: "/dev/ceph-a-vg_sdb/osd-block-b-lv_sdb"
              type: "block"
              class: "hdd"
              zapDisk: true

Note

In case of failures similar to the examples above, review the ceph-request-controller logs and the Ceph cluster status. Such failures may simply indicate timeout and retry issues. If no other issues were found, re-create the request with a new name and skip adding successfully removed Ceph OSDS or Ceph nodes.