Outbound telemetry¶

Outbound Product Telemetry is a standard architectural component of Mirantis OpenStack for Kubernetes (MOSK) designed to provide Mirantis with the visibility necessary to ensure the health, stability, and ongoing improvement of customer environments. It serves as the primary mechanism for Mirantis to understand product behavior in the field, enabling the team to provide high-quality service and data-driven product development.

The primary goal of product telemetry is to shift from reactive to proactive support by transforming technical data into actionable business value. By analyzing usage patterns, Mirantis can prevent service disruptions by identifying declining utilization or resource exhaustion before these issues impact critical workloads. This visibility also allows Mirantis to optimize hardware performance by correlating specific hardware models and sizing with real-world performance, which in turn provides precise architectural guidance for future deployments.

Furthermore, telemetry data is used to inform the product roadmap, allowing the team to prioritize features and updates based on the actual versions and services currently in use across the entire customer base. Ultimately, this system helps sustain customer value by allowing Mirantis to proactively approach cloud operators to solve workload onboarding issues before they can impact the long-term viability of the cloud infrastructure.

Architecture and data flow¶

The telemetry subsystem is designed as a secure, one-way communication channel from the customer environment to Mirantis. This process begins with the StackLight component in the management cluster, which automatically collects health, usage, and performance metrics from both the management cluster and all managed MOSK clusters. Once collected, the data is processed and aggregated according to predefined rules to ensure that only high-level system metadata is prepared for transmission.

Following aggregation, the telemetry data is pushed through the customer firewall or internet proxy to a secure, encrypted Mirantis telemetry endpoint. For this data flow to function correctly, it is essential that the customer networking team configures the necessary outbound rules to allow traffic to safely reach the designated Mirantis synchronization endpoint.

Data privacy: safe by design¶

Telemetry is explicitly designed to describe the state and performance of the infrastructure, not the content of the data processed within it. No Personally Identifiable Information (PII) or sensitive data is collected.

What is collected¶

The telemetry service collects approximately 150 distinct metrics categorized as follows:

Infrastructure health – names of active firing alerts, node counts, and the availability of core Kubernetes and OpenStack APIs.
Capacity & usage – physical and virtual CPU/RAM capacity, node filesystem size, and total storage requested using Persistent Volume Claims (PVCs).

What is not collected¶

To protect customer privacy and security, the telemetry subsystem is strictly prohibited from collecting:

Tenant data – no application data, customer database content, or virtual machine file content is ever accessed.
Sensitive information – no secrets, passwords, certificates, or encryption keys are part of the collection schema.
Personal data – no names, email addresses, or user-identifiable records are gathered.
Internal network identifiers – no IP addresses or specific hostnames are transmitted; all identifiers are machine-generated and anonymized.

Support and service delivery¶

To provide proactive care and rapid diagnostics, MOSK telemetry serves as the operational foundation for Mirantis Support and Managed Services. MOSK Support Services Exhibit and, specifically, Attachment 1 covering OpsCare Plus state that automated telemetry is a core requirement that establishes a data-driven baseline for technical health of the cloud infrastructure. The data allows Mirantis to perform efficient diagnostics, offer proactive assistance, and verify the environmental conditions necessary for Service Level Agreement (SLA) attainment.

Maintaining an active telemetry collector is a standard operational practice essential for high-quality service delivery. It provides visibility into the environment without affecting support continuity. Any modifications to the configuration of the telemetry subsystem need to be coordinated with Mirantis to ensure the environment remains aligned with standard support prerequisites.

Outbound metrics¶

General cluster information¶

kaas_cluster_info – cluster metadata: type, Kubernetes release, license, underlay
cluster_alerts_firing – number of active alerts currently firing in the cluster
cluster_capacity_cpu_cores – total CPU cores available to the workloads in the cluster
cluster_capacity_memory_bytes – total RAM (in bytes) available in the cluster
cluster_filesystem_size_bytes – total storage capacity of the local filesystems on cluster nodes
cluster_filesystem_usage_bytes – amount of node disk space currently used
cluster_filesystem_usage_ratio – percentage of node disk space currently consumed
cluster_master_nodes_total – number of control-plane/master nodes
cluster_nodes_total – total number of nodes in the cluster
cluster_persistentvolumeclaim_requests_storage_bytes – total storage space requested by user persistent volumes
cluster_total_alerts_triggered – total count of unique alert events that have fired
cluster_usage_cpu_cores – CPU cores currently used on a node
cluster_usage_memory_bytes – RAM currently used on a node
cluster_usage_per_capacity_cpu_ratio – overall CPU utilization ratio
cluster_usage_per_capacity_memory_ratio – overall RAM utilization ratio
cluster_worker_nodes_total – number of worker nodes available for user apps
cluster_workload_containers_total – total number of individual containers running
cluster_workload_pods_total – total number of Kubernetes Pods running

Cluster lifecycle management¶

kaas_cluster_machines_ready_total – count of ready nodes in a specific cluster
kaas_cluster_machines_requested_total – number of nodes the user has requested to exist
kaas_cluster_manager_machines_total – number of nodes with the role of a Kubernetes manager in the cluster
kaas_cluster_updating – flag showing if a cluster is in the middle of an upgrade
kaas_cluster_worker_machines_total – count of worker nodes within a specific managed cluster
kaas_info – current version and build info of the KaaS service
kaas_license_expiry – the date and time when the cluster software license expires
kaas-events – high-level status and versioning of the KaaS platform
kubernetes_api_availability – health check for the core Kubernetes API server
mcc_cluster_update_plan_status – readiness/success of maintenance and update plans
mcc_collector_error – count of errors within the telemetry collection system itself
hostos_module_usage – tracking of active kernel modules on the host operating system

Hardware information¶

mcc_hw_machine_chassis – the physical form factor, for example, Blade or Rack Mount
mcc_hw_machine_cpu_model – the specific model of processor, for example, Intel Xeon Gold
mcc_hw_machine_cpu_number – number of physical CPU sockets in the server
mcc_hw_machine_nics – details on physical network interface cards
mcc_hw_machine_ram – total physical memory installed in the server hardware
mcc_hw_machine_storage – details of physical local drives (HDD/SSD)
mcc_hw_machine_vendor – the hardware manufacturer, for example, Dell, HP, or Supermicro
mcc_release_controller_state – status of the component managing software releases

Kubernetes underlay (Mirantis Kubernetes Engine)¶

mke_api_availability – availability of the cluster’s underlying Mirantis Kubernetes Engine API
mke_cluster_containers_total – count of containers running in Kubernetes underlay
mke_cluster_nodes_total – count of nodes in Kubernetes underlay
mke_cluster_vcpu_free – remaining vCPU capacity in Kubernetes underlay
mke_cluster_vcpu_used – consumed vCPU capacity in Kubernetes underlay
mke_cluster_vram_free – remaining RAM capacity in Kubernetes underlay
mke_cluster_vram_used – consumed RAM capacity in Kubernetes underlay
mke_cluster_vstorage_free – remaining virtual storage in Kubernetes underlay
mke_cluster_vstorage_used – consumed virtual storage in Kubernetes underlay
node_labels – labels assigned to the nodes in Kubernetes underlay

OpenStack services¶

openstack_cinder_api_latency_90 – response latency of the Block Storage service (OpenStack Cinder) API – 90th percentile
openstack_cinder_api_latency_99 – response latency of the Block Storage service (OpenStack Cinder) API – 99th percentile
openstack_cinder_api_status – current health status of the Block Storage service (OpenStack Cinder) API endpoint
openstack_cinder_availability – ratio of successful (non-5xx) responses of Block Storage service (OpenStack Cinder) API
openstack_cinder_volumes_total – total number of volumes
openstack_glance_api_status – current health status of the Image service (OpenStack Glance) API
openstack_glance_availability – ratio of successful (non-5xx) responses of the Image service (OpenStack Glance) API
openstack_glance_images_total – total number of images
openstack_glance_snapshots_total – total number of backup snapshots
openstack_heat_availability – ratio of successful (non-5xx) responses of the Orchestration service (OpenStack Heat) API
openstack_heat_stacks_total – total number of orchestration stacks
openstack_instance_availability – ratio of instances in non-error state
openstack_instance_create_end – counter of successful instance creations
openstack_instance_create_error – counter of failed instance creations
openstack_instance_create_start – counter of attempted instance creations
openstack_keystone_api_latency_90 – response latency of the Identity service (OpenStack Keystone) API – 90th percentile
openstack_keystone_api_latency_99 – response latency of the Identity service (OpenStack Keystone) API – 99th percentile
openstack_keystone_api_status – current health status of the Identity service (OpenStack Keystone) API
openstack_keystone_availability – ratio of successful (non-5xx) responses of the Identity service (OpenStack Keystone) API
openstack_keystone_tenants_total – number of projects
openstack_keystone_users_total – total number of registered user accounts
openstack_kpi_provisioning – ratio of successful instance creations
openstack_lbaas_availability – ratio of load balancers in non-error state
openstack_mysql_flow_control – health indicator of the OpenStack database cluster
openstack_neutron_api_latency_90 – response latency of the Network service (OpenStack Neutron) API – 90th percentile
openstack_neutron_api_latency_99 – response latency of the Network service (OpenStack Neutron) API – 99th percentile
openstack_neutron_api_status – current health status of the Network service (OpenStack Neutron) API
openstack_neutron_availability – ratio of successful (non-5xx) responses of the Network service (OpenStack Neutron) API
openstack_neutron_lbaas_loadbalancers_total – total number of load balancers
openstack_neutron_networks_total – total number of networks
openstack_neutron_ports_total – total number of network ports
openstack_neutron_routers_total – total number of routers
openstack_neutron_subnets_total – total number of subnets
openstack_nova_all_compute_cpu_utilisation – global CPU usage percentage of all hypervisors
openstack_nova_all_compute_mem_utilisation – global RAM usage percentage of all hypervisors
openstack_nova_all_computes_total – total number of compute nodes in the cluster
openstack_nova_all_disk_total_gb – global capacity of root/ephemeral storage (GB)
openstack_nova_all_ram_total_gb – global RAM capacity (GB)
openstack_nova_all_used_disk_total_gb – global root/ephemeral storage used (GB)
openstack_nova_all_used_ram_total_gb – global RAM used (GB)
openstack_nova_all_used_vcpus_total – global vCPUs allocated to VMs
openstack_nova_all_vcpus_total – global vCPU capacity
openstack_nova_api_status – current health status of the Compute service (OpenStack Nova) API
openstack_nova_availability – ratio of successful (non-5xx) responses from the Compute service (OpenStack Nova) API
openstack_nova_compute_cpu_utilisation – CPU utilization across all available compute nodes in the cluster
openstack_nova_compute_mem_utilisation – RAM utilization across all available compute nodes in the cluster
openstack_nova_computes_total – number of compute nodes in the cluster
openstack_nova_disk_total_gb – per-server disk capacity
openstack_nova_instances_active_total – total instances currently in Active state
openstack_nova_ram_total_gb – per-server RAM capacity
openstack_nova_used_disk_total_gb – per-server disk used
openstack_nova_used_ram_total_gb – per-server RAM used
openstack_nova_used_vcpus_total – per-server vCPUs allocated
openstack_nova_vcpus_total – per-server vCPU capacity
openstack_public_api_status – status of the public-facing API endpoints
openstack_quota_instances – maximum allowed VM instances (limit)
openstack_quota_ram_gb – maximum allowed RAM (limit)
openstack_quota_vcpus – maximum allowed vCPUs (limit)
openstack_quota_volume_storage_gb – maximum allowed storage (limit)
openstack_rmq_message_deriv – RabbitMQ message total queue depth, number of ready and acknowledged messages
openstack_usage_instances – current VM instance count vs the limit
openstack_usage_ram_gb – current RAM usage vs the limit
openstack_usage_vcpus – current vCPU count vs the limit
openstack_usage_volume_storage_gb – current storage usage vs the limit
osdpl_aodh_alarms – number of alarms in the Alarming service (OpenStack Aodh)
osdpl_api_success – availability status of all OpenStack API endpoints, including internal and admin
osdpl_cinder_zone_volumes – number of volumes per availability zone in the Block Storage service (OpenStack Cinder)
osdpl_ironic_nodes – number of bare-metal servers managed by the Bare Metal service (OpenStack Ironic)
osdpl_manila_shares – number of shared filesystems managed by the Shared Filesystem service (OpenStack Manila)
osdpl_masakari_hosts – number of hypervisors protected by the Instance HA service (OpenStack Masakari)
osdpl_neutron_availability_zone_info – metadata of availability zones defined in the Networking service (OpenStack Neutron)
osdpl_neutron_zone_routers – number of routers per availability zone in the Networking service (OpenStack Neutron)
osdpl_nova_aggregate_hosts – number of compute nodes per host aggregate
osdpl_nova_audit_orphaned_allocations – number of instance records in the Compute service (OpenStack Nova) DB that might be stuck or orphaned
osdpl_nova_availability_zone_hosts – number of compute nodes per availability zone
osdpl_nova_availability_zone_info – metadata of availability zones defined in the Compute service (OpenStack Nova)
osdpl_nova_availability_zone_instances – number of instances per availability zone
osdpl_version_info – OpenStack version, for example Antelope
tf_operator_info – OpenSDN/Tungsten Fabric version

No results

An error occurred

Outbound telemetry¶

Architecture and data flow¶

Data privacy: safe by design¶

What is collected¶

What is not collected¶

Support and service delivery¶

Outbound metrics¶

General cluster information¶

Cluster lifecycle management¶

Hardware information¶

Kubernetes underlay (Mirantis Kubernetes Engine)¶

OpenStack services¶