Manage metrics filtering

Available since 2.24.0 and 2.24.2 for MOSK 23.2

By default, StackLight drops unused metrics to increase Prometheus performance providing better resource utilization and faster query response. The following list contains white-listed scrape jobs grouped by the job name. Prometheus collects metrics from this list by default.

White list of Prometheus scrape jobs
_group-blackbox-metrics:
  - probe_dns_lookup_time_seconds
  - probe_duration_seconds
  - probe_http_content_length
  - probe_http_duration_seconds
  - probe_http_ssl
  - probe_http_uncompressed_body_length
  - probe_ssl_earliest_cert_expiry
  - probe_success
_group-controller-runtime-metrics:
  - workqueue_adds_total
  - workqueue_depth
  - workqueue_queue_duration_seconds_count
  - workqueue_queue_duration_seconds_sum
  - workqueue_retries_total
  - workqueue_work_duration_seconds_count
  - workqueue_work_duration_seconds_sum
_group-etcd-metrics:
  - etcd_cluster_version
  - etcd_debugging_snap_save_total_duration_seconds_sum
  - etcd_disk_backend_commit_duration_seconds_bucket
  - etcd_disk_backend_commit_duration_seconds_count
  - etcd_disk_backend_commit_duration_seconds_sum
  - etcd_disk_backend_snapshot_duration_seconds_count
  - etcd_disk_backend_snapshot_duration_seconds_sum
  - etcd_disk_wal_fsync_duration_seconds_bucket
  - etcd_disk_wal_fsync_duration_seconds_count
  - etcd_disk_wal_fsync_duration_seconds_sum
  - etcd_mvcc_db_total_size_in_bytes
  - etcd_network_client_grpc_received_bytes_total
  - etcd_network_client_grpc_sent_bytes_total
  - etcd_network_peer_received_bytes_total
  - etcd_network_peer_sent_bytes_total
  - etcd_server_go_version
  - etcd_server_has_leader
  - etcd_server_leader_changes_seen_total
  - etcd_server_proposals_applied_total
  - etcd_server_proposals_committed_total
  - etcd_server_proposals_failed_total
  - etcd_server_proposals_pending
  - etcd_server_quota_backend_bytes
  - etcd_server_version
  - grpc_server_handled_total
  - grpc_server_started_total
_group-go-collector-metrics:
  - go_gc_duration_seconds
  - go_gc_duration_seconds_count
  - go_gc_duration_seconds_sum
  - go_goroutines
  - go_info
  - go_memstats_alloc_bytes
  - go_memstats_alloc_bytes_total
  - go_memstats_buck_hash_sys_bytes
  - go_memstats_frees_total
  - go_memstats_gc_sys_bytes
  - go_memstats_heap_alloc_bytes
  - go_memstats_heap_idle_bytes
  - go_memstats_heap_inuse_bytes
  - go_memstats_heap_released_bytes
  - go_memstats_heap_sys_bytes
  - go_memstats_lookups_total
  - go_memstats_mallocs_total
  - go_memstats_mcache_inuse_bytes
  - go_memstats_mcache_sys_bytes
  - go_memstats_mspan_inuse_bytes
  - go_memstats_mspan_sys_bytes
  - go_memstats_next_gc_bytes
  - go_memstats_other_sys_bytes
  - go_memstats_stack_inuse_bytes
  - go_memstats_stack_sys_bytes
  - go_memstats_sys_bytes
  - go_threads
_group-process-collector-metrics:
  - process_cpu_seconds_total
  - process_max_fds
  - process_open_fds
  - process_resident_memory_bytes
  - process_start_time_seconds
  - process_virtual_memory_bytes
_group-rest-client-metrics:
  - rest_client_request_latency_seconds_count
  - rest_client_request_latency_seconds_sum
_group-service-handler-metrics:
  - service_handler_count
  - service_handler_sum
_group-service-http-metrics:
  - service_http_count
  - service_http_sum
_group-service-reconciler-metrics:
  - service_reconciler_count
  - service_reconciler_sum
alertmanager-webhook-servicenow:
  - servicenow_auth_ok
blackbox: []
blackbox-external-endpoint: []
cadvisor:
  - cadvisor_version_info
  - container_cpu_cfs_periods_total
  - container_cpu_cfs_throttled_periods_total
  - container_cpu_usage_seconds_total
  - container_fs_reads_bytes_total
  - container_fs_reads_total
  - container_fs_writes_bytes_total
  - container_fs_writes_total
  - container_memory_usage_bytes
  - container_memory_working_set_bytes
  - container_network_receive_bytes_total
  - container_network_transmit_bytes_total
  - container_scrape_error
  - machine_cpu_cores
calico:
  - felix_active_local_endpoints
  - felix_active_local_policies
  - felix_active_local_selectors
  - felix_active_local_tags
  - felix_cluster_num_host_endpoints
  - felix_cluster_num_hosts
  - felix_cluster_num_workload_endpoints
  - felix_host
  - felix_int_dataplane_addr_msg_batch_size_count
  - felix_int_dataplane_addr_msg_batch_size_sum
  - felix_int_dataplane_failures
  - felix_int_dataplane_iface_msg_batch_size_count
  - felix_int_dataplane_iface_msg_batch_size_sum
  - felix_ipset_errors
  - felix_ipsets_calico
  - felix_iptables_chains
  - felix_iptables_restore_errors
  - felix_iptables_save_errors
  - felix_resyncs_started
etcd-server: []
fluentd:
  - apache_http_request_duration_seconds_bucket
  - apache_http_request_duration_seconds_count
  - docker_networkdb_stats_netmsg
  - docker_networkdb_stats_qlen
  - kernel_io_errors_total # Since 2.27.0 (17.2.0 and 16.2.0)
helm-controller:
  - helmbundle_reconcile_up
  - helmbundle_release_ready
  - helmbundle_release_status
  - helmbundle_release_success
  - rest_client_requests_total
host-os-modules-controller:
  - hostos_module_deprecation_info # Since 2.28.0 (17.3.0 and 16.3.0)
  - hostos_module_usage # Since 2.28.0 (17.3.0 and 16.3.0)
ironic:
  - ironic_driver_metadata
  - ironic_drivers_total
  - ironic_nodes
  - ironic_up
kaas-exporter:
  - kaas_cluster_info
  - kaas_cluster_lcm_healthy # Since 2.28.0 (17.3.0 and 16.3.0)
  - kaas_cluster_ready # Since 2.28.0 (17.3.0 and 16.3.0)
  - kaas_cluster_updating
  - kaas_clusters
  - kaas_info
  - kaas_license_expiry
  - kaas_machine_ready
  - kaas_machines_ready
  - kaas_machines_requested
  - mcc_cluster_pending_update_schedule_time # Since 2.28.0 (17.3.0 and 16.3.0)
  - mcc_cluster_pending_update_status # Since 2.28.0 (17.3.0 and 16.3.0)
  - mcc_cluster_update_plan_status # Since 2.28.0 (17.3.0 and 16.3.0) as TechPreview
  - mcc_cluster_update_plan_step_status # Since 2.28.0 (17.3.0 and 16.3.0) as TechPreview
  - rest_client_requests_total
kubelet:
  - kubelet_running_containers
  - kubelet_running_pods
  - kubelet_volume_stats_available_bytes
  - kubelet_volume_stats_capacity_bytes
  - kubelet_volume_stats_used_bytes # Since 2.26.0 (17.1.0 and 16.1.0)
  - kubernetes_build_info
  - rest_client_requests_total
kubernetes-apiservers:
  - apiserver_client_certificate_expiration_seconds_bucket
  - apiserver_client_certificate_expiration_seconds_count
  - apiserver_request_total
  - kubernetes_build_info
  - rest_client_requests_total
kubernetes-master-api: []
mcc-blackbox: []
mcc-cache: []
mcc-controllers:
  - rest_client_requests_total
mcc-providers:
  - rest_client_requests_total
mke-manager-api: []
mke-metrics-controller:
  - ucp_controller_services
  - ucp_engine_node_health
mke-metrics-engine:
  - ucp_engine_container_cpu_percent
  - ucp_engine_container_cpu_total_time_nanoseconds
  - ucp_engine_container_health
  - ucp_engine_container_memory_usage_bytes
  - ucp_engine_container_network_rx_bytes_total
  - ucp_engine_container_network_tx_bytes_total
  - ucp_engine_container_unhealth
  - ucp_engine_containers
  - ucp_engine_disk_free_bytes
  - ucp_engine_disk_total_bytes
  - ucp_engine_images
  - ucp_engine_memory_total_bytes
  - ucp_engine_num_cpu_cores
msr-api: []
openstack-blackbox-ext: []
openstack-cloudprober: # Since MOSK 24.2
  - cloudprober_success
  - cloudprober_total
openstack-dns-probes: # Since MOSK 24.3
  - probe_dns_duration_seconds
openstack-ingress-controller:
  - nginx_ingress_controller_build_info
  - nginx_ingress_controller_config_hash
  - nginx_ingress_controller_config_last_reload_successful
  - nginx_ingress_controller_nginx_process_connections
  - nginx_ingress_controller_nginx_process_cpu_seconds_total
  - nginx_ingress_controller_nginx_process_resident_memory_bytes
  - nginx_ingress_controller_request_duration_seconds_bucket
  - nginx_ingress_controller_request_size_sum
  - nginx_ingress_controller_requests
  - nginx_ingress_controller_response_size_sum
  - nginx_ingress_controller_ssl_expire_time_seconds
  - nginx_ingress_controller_success
openstack-portprober:
  - portprober_arping_target_success
  - portprober_arping_target_total
openstack-powerdns: # Since MOSK 24.3
  - pdns_auth_backend_latency
  - pdns_auth_backend_queries
  - pdns_auth_cache_latency
  - pdns_auth_corrupt_packets
  - pdns_auth_cpu_iowait
  - pdns_auth_cpu_steal
  - pdns_auth_dnsupdate_answers
  - pdns_auth_dnsupdate_queries
  - pdns_auth_dnsupdate_refused
  - pdns_auth_fd_usage
  - pdns_auth_incoming_notifications
  - pdns_auth_open_tcp_connections
  - pdns_auth_qsize_q
  - pdns_auth_receive_latency
  - pdns_auth_ring_logmessages_capacity
  - pdns_auth_ring_logmessages_size
  - pdns_auth_ring_noerror_queries_capacity
  - pdns_auth_ring_noerror_queries_size
  - pdns_auth_ring_nxdomain_queries_capacity
  - pdns_auth_ring_nxdomain_queries_size
  - pdns_auth_ring_queries_capacity
  - pdns_auth_ring_queries_size
  - pdns_auth_ring_remotes_capacity
  - pdns_auth_ring_remotes_size
  - pdns_auth_ring_remotes_unauth_capacity
  - pdns_auth_ring_remotes_unauth_size
  - pdns_auth_ring_servfail_queries_capacity
  - pdns_auth_ring_servfail_queries_size
  - pdns_auth_ring_unauth_queries_capacity
  - pdns_auth_ring_unauth_queries_size
  - pdns_auth_signatures
  - pdns_auth_sys_msec
  - pdns_auth_tcp4_answers
  - pdns_auth_tcp4_answers_bytes
  - pdns_auth_tcp4_queries
  - pdns_auth_tcp6_answers
  - pdns_auth_tcp6_answers_bytes
  - pdns_auth_tcp6_queries
  - pdns_auth_timedout_packets
  - pdns_auth_udp4_answers
  - pdns_auth_udp4_answers_bytes
  - pdns_auth_udp4_queries
  - pdns_auth_udp6_answers
  - pdns_auth_udp6_answers_bytes
  - pdns_auth_udp6_queries
  - pdns_auth_udp_in_csum_errors
  - pdns_auth_udp_in_errors
  - pdns_auth_udp_noport_errors
  - pdns_auth_udp_recvbuf_errors
  - pdns_auth_udp_sndbuf_errors
  - pdns_auth_uptime
  - pdns_auth_user_msec
osdpl-exporter: # Removed in MOSK 24.1
  - osdpl_aodh_alarms
  - osdpl_certificate_expiry
  - osdpl_cinder_zone_volumes
  - osdpl_neutron_availability_zone_info
  - osdpl_neutron_zone_routers
  - osdpl_nova_aggregate_hosts
  - osdpl_nova_availability_zone_info
  - osdpl_nova_availability_zone_instances
  - osdpl_nova_availability_zone_hosts
  - osdpl_version_info
patroni:
  - patroni_patroni_cluster_unlocked
  - patroni_patroni_info
  - patroni_postgresql_info
  - patroni_replication_info
  - patroni_xlog_location
  - patroni_xlog_paused
  - patroni_xlog_received_location
  - patroni_xlog_replayed_location
  - python_info
postgresql:
  - pg_database_size
  - pg_locks_count
  - pg_stat_activity_count
  - pg_stat_activity_max_tx_duration
  - pg_stat_archiver_failed_count
  - pg_stat_bgwriter_buffers_alloc_total
  - pg_stat_bgwriter_buffers_backend_fsync_total
  - pg_stat_bgwriter_buffers_backend_total
  - pg_stat_bgwriter_buffers_checkpoint_total
  - pg_stat_bgwriter_buffers_clean_total
  - pg_stat_bgwriter_checkpoint_sync_time_total
  - pg_stat_bgwriter_checkpoint_write_time_total
  - pg_stat_database_blks_hit
  - pg_stat_database_blks_read
  - pg_stat_database_checksum_failures
  - pg_stat_database_conflicts
  - pg_stat_database_conflicts_confl_bufferpin
  - pg_stat_database_conflicts_confl_deadlock
  - pg_stat_database_conflicts_confl_lock
  - pg_stat_database_conflicts_confl_snapshot
  - pg_stat_database_conflicts_confl_tablespace
  - pg_stat_database_deadlocks
  - pg_stat_database_temp_bytes
  - pg_stat_database_tup_deleted
  - pg_stat_database_tup_fetched
  - pg_stat_database_tup_inserted
  - pg_stat_database_tup_returned
  - pg_stat_database_tup_updated
  - pg_stat_database_xact_commit
  - pg_stat_database_xact_rollback
  - postgres_exporter_build_info
prometheus-alertmanager:
  - alertmanager_active_alerts
  - alertmanager_active_silences
  - alertmanager_alerts
  - alertmanager_alerts_invalid_total
  - alertmanager_alerts_received_total
  - alertmanager_build_info
  - alertmanager_cluster_failed_peers
  - alertmanager_cluster_health_score
  - alertmanager_cluster_members
  - alertmanager_cluster_messages_pruned_total
  - alertmanager_cluster_messages_queued
  - alertmanager_cluster_messages_received_size_total
  - alertmanager_cluster_messages_received_total
  - alertmanager_cluster_messages_sent_size_total
  - alertmanager_cluster_messages_sent_total
  - alertmanager_cluster_peer_info
  - alertmanager_cluster_peers_joined_total
  - alertmanager_cluster_peers_left_total
  - alertmanager_cluster_reconnections_failed_total
  - alertmanager_cluster_reconnections_total
  - alertmanager_config_last_reload_success_timestamp_seconds
  - alertmanager_config_last_reload_successful
  - alertmanager_nflog_gc_duration_seconds_count
  - alertmanager_nflog_gc_duration_seconds_sum
  - alertmanager_nflog_gossip_messages_propagated_total
  - alertmanager_nflog_queries_total
  - alertmanager_nflog_query_duration_seconds_bucket
  - alertmanager_nflog_query_errors_total
  - alertmanager_nflog_snapshot_duration_seconds_count
  - alertmanager_nflog_snapshot_duration_seconds_sum
  - alertmanager_nflog_snapshot_size_bytes
  - alertmanager_notification_latency_seconds_bucket
  - alertmanager_notifications_failed_total
  - alertmanager_notifications_total
  - alertmanager_oversize_gossip_message_duration_seconds_bucket
  - alertmanager_oversized_gossip_message_dropped_total
  - alertmanager_oversized_gossip_message_failure_total
  - alertmanager_oversized_gossip_message_sent_total
  - alertmanager_partial_state_merges_failed_total
  - alertmanager_partial_state_merges_total
  - alertmanager_silences
  - alertmanager_silences_gc_duration_seconds_count
  - alertmanager_silences_gc_duration_seconds_sum
  - alertmanager_silences_gossip_messages_propagated_total
  - alertmanager_silences_queries_total
  - alertmanager_silences_query_duration_seconds_bucket
  - alertmanager_silences_query_errors_total
  - alertmanager_silences_snapshot_duration_seconds_count
  - alertmanager_silences_snapshot_duration_seconds_sum
  - alertmanager_silences_snapshot_size_bytes
  - alertmanager_state_replication_failed_total
  - alertmanager_state_replication_total
prometheus-elasticsearch-exporter:
  - elasticsearch_breakers_estimated_size_bytes
  - elasticsearch_breakers_limit_size_bytes
  - elasticsearch_breakers_tripped
  - elasticsearch_cluster_health_active_primary_shards
  - elasticsearch_cluster_health_active_shards
  - elasticsearch_cluster_health_delayed_unassigned_shards
  - elasticsearch_cluster_health_initializing_shards
  - elasticsearch_cluster_health_number_of_data_nodes
  - elasticsearch_cluster_health_number_of_nodes
  - elasticsearch_cluster_health_number_of_pending_tasks
  - elasticsearch_cluster_health_relocating_shards
  - elasticsearch_cluster_health_status
  - elasticsearch_cluster_health_unassigned_shards
  - elasticsearch_exporter_build_info
  - elasticsearch_indices_docs
  - elasticsearch_indices_docs_deleted
  - elasticsearch_indices_docs_primary
  - elasticsearch_indices_fielddata_evictions
  - elasticsearch_indices_fielddata_memory_size_bytes
  - elasticsearch_indices_filter_cache_evictions
  - elasticsearch_indices_flush_time_seconds
  - elasticsearch_indices_flush_total
  - elasticsearch_indices_get_exists_time_seconds
  - elasticsearch_indices_get_exists_total
  - elasticsearch_indices_get_missing_time_seconds
  - elasticsearch_indices_get_missing_total
  - elasticsearch_indices_get_time_seconds
  - elasticsearch_indices_get_total
  - elasticsearch_indices_indexing_delete_time_seconds_total
  - elasticsearch_indices_indexing_delete_total
  - elasticsearch_indices_indexing_index_time_seconds_total
  - elasticsearch_indices_indexing_index_total
  - elasticsearch_indices_merges_docs_total
  - elasticsearch_indices_merges_total
  - elasticsearch_indices_merges_total_size_bytes_total
  - elasticsearch_indices_merges_total_time_seconds_total
  - elasticsearch_indices_query_cache_evictions
  - elasticsearch_indices_query_cache_memory_size_bytes
  - elasticsearch_indices_refresh_time_seconds_total
  - elasticsearch_indices_refresh_total
  - elasticsearch_indices_search_fetch_time_seconds
  - elasticsearch_indices_search_fetch_total
  - elasticsearch_indices_search_query_time_seconds
  - elasticsearch_indices_search_query_total
  - elasticsearch_indices_segment_count_primary
  - elasticsearch_indices_segment_count_total
  - elasticsearch_indices_segment_memory_bytes_primary
  - elasticsearch_indices_segment_memory_bytes_total
  - elasticsearch_indices_segments_count
  - elasticsearch_indices_segments_memory_bytes
  - elasticsearch_indices_store_size_bytes
  - elasticsearch_indices_store_size_bytes_primary
  - elasticsearch_indices_store_size_bytes_total
  - elasticsearch_indices_store_throttle_time_seconds_total
  - elasticsearch_indices_translog_operations
  - elasticsearch_indices_translog_size_in_bytes
  - elasticsearch_jvm_gc_collection_seconds_count
  - elasticsearch_jvm_gc_collection_seconds_sum
  - elasticsearch_jvm_memory_committed_bytes
  - elasticsearch_jvm_memory_max_bytes
  - elasticsearch_jvm_memory_pool_peak_used_bytes
  - elasticsearch_jvm_memory_used_bytes
  - elasticsearch_os_load1
  - elasticsearch_os_load15
  - elasticsearch_os_load5
  - elasticsearch_process_cpu_percent
  - elasticsearch_process_cpu_seconds_total
  - elasticsearch_process_cpu_time_seconds_sum
  - elasticsearch_process_open_files_count
  - elasticsearch_thread_pool_active_count
  - elasticsearch_thread_pool_completed_count
  - elasticsearch_thread_pool_queue_count
  - elasticsearch_thread_pool_rejected_count
  - elasticsearch_transport_rx_size_bytes_total
  - elasticsearch_transport_tx_size_bytes_total
prometheus-grafana:
  - grafana_api_dashboard_get_milliseconds
  - grafana_api_dashboard_get_milliseconds_count
  - grafana_api_dashboard_get_milliseconds_sum
  - grafana_api_dashboard_save_milliseconds
  - grafana_api_dashboard_save_milliseconds_count
  - grafana_api_dashboard_save_milliseconds_sum
  - grafana_api_dashboard_search_milliseconds
  - grafana_api_dashboard_search_milliseconds_count
  - grafana_api_dashboard_search_milliseconds_sum
  - grafana_api_dataproxy_request_all_milliseconds
  - grafana_api_dataproxy_request_all_milliseconds_count
  - grafana_api_dataproxy_request_all_milliseconds_sum
  - grafana_api_login_oauth_total
  - grafana_api_login_post_total
  - grafana_api_response_status_total
  - grafana_build_info
  - grafana_feature_toggles_info
  - grafana_http_request_duration_seconds_count
  - grafana_page_response_status_total
  - grafana_plugin_build_info
  - grafana_proxy_response_status_total
  - grafana_stat_total_orgs
  - grafana_stat_total_users
  - grafana_stat_totals_dashboard
prometheus-kube-state-metrics:
  - kube_cronjob_next_schedule_time
  - kube_daemonset_created
  - kube_daemonset_status_current_number_scheduled
  - kube_daemonset_status_desired_number_scheduled
  - kube_daemonset_status_number_available
  - kube_daemonset_status_number_misscheduled
  - kube_daemonset_status_number_ready
  - kube_daemonset_status_number_unavailable
  - kube_daemonset_status_observed_generation
  - kube_daemonset_status_updated_number_scheduled
  - kube_deployment_created
  - kube_deployment_metadata_generation
  - kube_deployment_spec_replicas
  - kube_deployment_status_observed_generation
  - kube_deployment_status_replicas
  - kube_deployment_status_replicas_available
  - kube_deployment_status_replicas_unavailable
  - kube_deployment_status_replicas_updated
  - kube_endpoint_address_available
  - kube_job_status_active
  - kube_job_status_failed
  - kube_job_status_succeeded
  - kube_namespace_created
  - kube_namespace_status_phase
  - kube_node_info
  - kube_node_labels
  - kube_node_role
  - kube_node_spec_taint
  - kube_node_spec_unschedulable
  - kube_node_status_allocatable
  - kube_node_status_capacity
  - kube_node_status_condition
  - kube_persistentvolume_capacity_bytes
  - kube_persistentvolume_status_phase
  - kube_persistentvolumeclaim_resource_requests_storage_bytes
  - kube_pod_container_info
  - kube_pod_container_resource_limits
  - kube_pod_container_resource_requests
  - kube_pod_container_status_restarts_total
  - kube_pod_container_status_running
  - kube_pod_container_status_terminated
  - kube_pod_container_status_waiting
  - kube_pod_info
  - kube_pod_init_container_status_running
  - kube_pod_status_phase
  - kube_service_status_load_balancer_ingress
  - kube_statefulset_created
  - kube_statefulset_metadata_generation
  - kube_statefulset_replicas
  - kube_statefulset_status_current_revision
  - kube_statefulset_status_observed_generation
  - kube_statefulset_status_replicas
  - kube_statefulset_status_replicas_available
  - kube_statefulset_status_replicas_current
  - kube_statefulset_status_replicas_ready
  - kube_statefulset_status_replicas_updated
  - kube_statefulset_status_update_revision
prometheus-libvirt-exporter:
  - libvirt_domain_block_stats_allocation
  - libvirt_domain_block_stats_capacity
  - libvirt_domain_block_stats_physical
  - libvirt_domain_block_stats_read_bytes_total
  - libvirt_domain_block_stats_read_requests_total
  - libvirt_domain_block_stats_write_bytes_total
  - libvirt_domain_block_stats_write_requests_total
  - libvirt_domain_info_cpu_time_seconds_total
  - libvirt_domain_info_maximum_memory_bytes
  - libvirt_domain_info_memory_usage_bytes
  - libvirt_domain_info_state
  - libvirt_domain_info_virtual_cpus
  - libvirt_domain_interface_stats_receive_bytes_total
  - libvirt_domain_interface_stats_receive_drops_total
  - libvirt_domain_interface_stats_receive_errors_total
  - libvirt_domain_interface_stats_receive_packets_total
  - libvirt_domain_interface_stats_transmit_bytes_total
  - libvirt_domain_interface_stats_transmit_drops_total
  - libvirt_domain_interface_stats_transmit_errors_total
  - libvirt_domain_interface_stats_transmit_packets_total
  - libvirt_domain_memory_actual_balloon_bytes
  - libvirt_domain_memory_available_bytes
  - libvirt_domain_memory_rss_bytes
  - libvirt_domain_memory_unused_bytes
  - libvirt_domain_memory_usable_bytes
  - libvirt_up
prometheus-memcached-exporter:
  - memcached_commands_total
  - memcached_current_bytes
  - memcached_current_connections
  - memcached_current_items
  - memcached_exporter_build_info
  - memcached_items_evicted_total
  - memcached_items_reclaimed_total
  - memcached_limit_bytes
  - memcached_read_bytes_total
  - memcached_up
  - memcached_version
  - memcached_written_bytes_total
prometheus-msteams: []
prometheus-mysql-exporter:
  - mysql_global_status_aborted_clients
  - mysql_global_status_aborted_connects
  - mysql_global_status_buffer_pool_pages
  - mysql_global_status_bytes_received
  - mysql_global_status_bytes_sent
  - mysql_global_status_commands_total
  - mysql_global_status_created_tmp_disk_tables
  - mysql_global_status_created_tmp_files
  - mysql_global_status_created_tmp_tables
  - mysql_global_status_handlers_total
  - mysql_global_status_innodb_log_waits
  - mysql_global_status_innodb_num_open_files
  - mysql_global_status_innodb_page_size
  - mysql_global_status_max_used_connections
  - mysql_global_status_open_files
  - mysql_global_status_open_table_definitions
  - mysql_global_status_open_tables
  - mysql_global_status_opened_files
  - mysql_global_status_opened_table_definitions
  - mysql_global_status_opened_tables
  - mysql_global_status_qcache_free_memory
  - mysql_global_status_qcache_hits
  - mysql_global_status_qcache_inserts
  - mysql_global_status_qcache_lowmem_prunes
  - mysql_global_status_qcache_not_cached
  - mysql_global_status_qcache_queries_in_cache
  - mysql_global_status_queries
  - mysql_global_status_questions
  - mysql_global_status_select_full_join
  - mysql_global_status_select_full_range_join
  - mysql_global_status_select_range
  - mysql_global_status_select_range_check
  - mysql_global_status_select_scan
  - mysql_global_status_slow_queries
  - mysql_global_status_sort_merge_passes
  - mysql_global_status_sort_range
  - mysql_global_status_sort_rows
  - mysql_global_status_sort_scan
  - mysql_global_status_table_locks_immediate
  - mysql_global_status_table_locks_waited
  - mysql_global_status_threads_cached
  - mysql_global_status_threads_connected
  - mysql_global_status_threads_created
  - mysql_global_status_threads_running
  - mysql_global_status_wsrep_flow_control_paused
  - mysql_global_status_wsrep_local_recv_queue
  - mysql_global_status_wsrep_local_state
  - mysql_global_status_wsrep_ready
  - mysql_global_variables_innodb_buffer_pool_size
  - mysql_global_variables_innodb_log_buffer_size
  - mysql_global_variables_key_buffer_size
  - mysql_global_variables_max_connections
  - mysql_global_variables_open_files_limit
  - mysql_global_variables_query_cache_size
  - mysql_global_variables_table_definition_cache
  - mysql_global_variables_table_open_cache
  - mysql_global_variables_thread_cache_size
  - mysql_global_variables_wsrep_desync
  - mysql_up
  - mysql_version_info
  - mysqld_exporter_build_info
prometheus-node-exporter:
  - node_arp_entries
  - node_bonding_active
  - node_bonding_slaves
  - node_boot_time_seconds
  - node_context_switches_total
  - node_cpu_seconds_total
  - node_disk_io_now
  - node_disk_io_time_seconds_total
  - node_disk_io_time_weighted_seconds_total
  - node_disk_read_bytes_total
  - node_disk_read_time_seconds_total
  - node_disk_reads_completed_total
  - node_disk_reads_merged_total
  - node_disk_write_time_seconds_total
  - node_disk_writes_completed_total
  - node_disk_writes_merged_total
  - node_disk_written_bytes_total
  - node_entropy_available_bits
  - node_exporter_build_info
  - node_filefd_allocated
  - node_filefd_maximum
  - node_filesystem_avail_bytes
  - node_filesystem_files
  - node_filesystem_files_free
  - node_filesystem_free_bytes
  - node_filesystem_readonly
  - node_filesystem_size_bytes
  - node_forks_total
  - node_hwmon_temp_celsius
  - node_hwmon_temp_crit_alarm_celsius
  - node_hwmon_temp_crit_celsius
  - node_hwmon_temp_crit_hyst_celsius
  - node_hwmon_temp_max_celsius
  - node_intr_total
  - node_load1
  - node_load15
  - node_load5
  - node_memory_Active_anon_bytes
  - node_memory_Active_bytes
  - node_memory_Active_file_bytes
  - node_memory_AnonHugePages_bytes
  - node_memory_AnonPages_bytes
  - node_memory_Bounce_bytes
  - node_memory_Buffers_bytes
  - node_memory_Cached_bytes
  - node_memory_CommitLimit_bytes
  - node_memory_Committed_AS_bytes
  - node_memory_DirectMap1G
  - node_memory_DirectMap2M_bytes
  - node_memory_DirectMap4k_bytes
  - node_memory_Dirty_bytes
  - node_memory_HardwareCorrupted_bytes
  - node_memory_HugePages_Free
  - node_memory_HugePages_Rsvd
  - node_memory_HugePages_Surp
  - node_memory_HugePages_Total
  - node_memory_Hugepagesize_bytes
  - node_memory_Inactive_anon_bytes
  - node_memory_Inactive_bytes
  - node_memory_Inactive_file_bytes
  - node_memory_KernelStack_bytes
  - node_memory_Mapped_bytes
  - node_memory_MemAvailable_bytes
  - node_memory_MemFree_bytes
  - node_memory_MemTotal_bytes
  - node_memory_Mlocked_bytes
  - node_memory_NFS_Unstable_bytes
  - node_memory_PageTables_bytes
  - node_memory_SReclaimable_bytes
  - node_memory_SUnreclaim_bytes
  - node_memory_Shmem_bytes
  - node_memory_Slab_bytes
  - node_memory_SwapCached_bytes
  - node_memory_SwapFree_bytes
  - node_memory_SwapTotal_bytes
  - node_memory_Unevictable_bytes
  - node_memory_VmallocChunk_bytes
  - node_memory_VmallocTotal_bytes
  - node_memory_VmallocUsed_bytes
  - node_memory_WritebackTmp_bytes
  - node_memory_Writeback_bytes
  - node_netstat_TcpExt_TCPSynRetrans
  - node_netstat_Tcp_ActiveOpens
  - node_netstat_Tcp_AttemptFails
  - node_netstat_Tcp_CurrEstab
  - node_netstat_Tcp_EstabResets
  - node_netstat_Tcp_InCsumErrors
  - node_netstat_Tcp_InErrs
  - node_netstat_Tcp_InSegs
  - node_netstat_Tcp_MaxConn
  - node_netstat_Tcp_OutRsts
  - node_netstat_Tcp_OutSegs
  - node_netstat_Tcp_PassiveOpens
  - node_netstat_Tcp_RetransSegs
  - node_netstat_Udp_InCsumErrors
  - node_netstat_Udp_InDatagrams
  - node_netstat_Udp_InErrors
  - node_netstat_Udp_NoPorts
  - node_netstat_Udp_OutDatagrams
  - node_netstat_Udp_RcvbufErrors
  - node_netstat_Udp_SndbufErrors
  - node_network_mtu_bytes
  - node_network_receive_bytes_total
  - node_network_receive_compressed_total
  - node_network_receive_drop_total
  - node_network_receive_errs_total
  - node_network_receive_fifo_total
  - node_network_receive_frame_total
  - node_network_receive_multicast_total
  - node_network_receive_packets_total
  - node_network_transmit_bytes_total
  - node_network_transmit_carrier_total
  - node_network_transmit_colls_total
  - node_network_transmit_compressed_total
  - node_network_transmit_drop_total
  - node_network_transmit_errs_total
  - node_network_transmit_fifo_total
  - node_network_transmit_packets_total
  - node_network_up
  - node_nf_conntrack_entries
  - node_nf_conntrack_entries_limit
  - node_procs_blocked
  - node_procs_running
  - node_scrape_collector_duration_seconds
  - node_scrape_collector_success
  - node_sockstat_FRAG_inuse
  - node_sockstat_FRAG_memory
  - node_sockstat_RAW_inuse
  - node_sockstat_TCP_alloc
  - node_sockstat_TCP_inuse
  - node_sockstat_TCP_mem
  - node_sockstat_TCP_mem_bytes
  - node_sockstat_TCP_orphan
  - node_sockstat_TCP_tw
  - node_sockstat_UDPLITE_inuse
  - node_sockstat_UDP_inuse
  - node_sockstat_UDP_mem
  - node_sockstat_UDP_mem_bytes
  - node_sockstat_sockets_used
  - node_textfile_scrape_error
  - node_time_seconds
  - node_timex_estimated_error_seconds
  - node_timex_frequency_adjustment_ratio
  - node_timex_maxerror_seconds
  - node_timex_offset_seconds
  - node_timex_sync_status
  - node_uname_info
prometheus-rabbitmq-exporter:
  - rabbitmq_channels
  - rabbitmq_connections
  - rabbitmq_consumers
  - rabbitmq_exchanges
  - rabbitmq_exporter_build_info
  - rabbitmq_fd_available
  - rabbitmq_fd_used
  - rabbitmq_node_disk_free
  - rabbitmq_node_disk_free_alarm
  - rabbitmq_node_mem_alarm
  - rabbitmq_node_mem_used
  - rabbitmq_partitions
  - rabbitmq_queue_messages_global
  - rabbitmq_queue_messages_ready_global
  - rabbitmq_queue_messages_unacknowledged_global
  - rabbitmq_queues
  - rabbitmq_sockets_available
  - rabbitmq_sockets_used
  - rabbitmq_up
  - rabbitmq_uptime
  - rabbitmq_version_info
prometheus-relay: []
prometheus-server:
  - prometheus_build_info
  - prometheus_config_last_reload_success_timestamp_seconds
  - prometheus_config_last_reload_successful
  - prometheus_engine_query_duration_seconds
  - prometheus_engine_query_duration_seconds_sum
  - prometheus_http_request_duration_seconds_count
  - prometheus_notifications_alertmanagers_discovered
  - prometheus_notifications_errors_total
  - prometheus_notifications_queue_capacity
  - prometheus_notifications_queue_length
  - prometheus_notifications_sent_total
  - prometheus_rule_evaluation_failures_total
  - prometheus_target_interval_length_seconds
  - prometheus_target_interval_length_seconds_count
  - prometheus_target_scrapes_sample_duplicate_timestamp_total
  - prometheus_tsdb_blocks_loaded
  - prometheus_tsdb_compaction_chunk_range_seconds_count
  - prometheus_tsdb_compaction_chunk_range_seconds_sum
  - prometheus_tsdb_compaction_chunk_samples_count
  - prometheus_tsdb_compaction_chunk_samples_sum
  - prometheus_tsdb_compaction_chunk_size_bytes_sum
  - prometheus_tsdb_compaction_duration_seconds_bucket
  - prometheus_tsdb_compaction_duration_seconds_count
  - prometheus_tsdb_compaction_duration_seconds_sum
  - prometheus_tsdb_compactions_failed_total
  - prometheus_tsdb_compactions_total
  - prometheus_tsdb_compactions_triggered_total
  - prometheus_tsdb_head_active_appenders
  - prometheus_tsdb_head_chunks
  - prometheus_tsdb_head_chunks_created_total
  - prometheus_tsdb_head_chunks_removed_total
  - prometheus_tsdb_head_gc_duration_seconds_sum
  - prometheus_tsdb_head_samples_appended_total
  - prometheus_tsdb_head_series
  - prometheus_tsdb_head_series_created_total
  - prometheus_tsdb_head_series_removed_total
  - prometheus_tsdb_reloads_failures_total
  - prometheus_tsdb_reloads_total
  - prometheus_tsdb_storage_blocks_bytes
  - prometheus_tsdb_wal_corruptions_total
  - prometheus_tsdb_wal_fsync_duration_seconds_count
  - prometheus_tsdb_wal_fsync_duration_seconds_sum
  - prometheus_tsdb_wal_truncations_failed_total
  - prometheus_tsdb_wal_truncations_total
rabbitmq-operator-metrics:
  - rest_client_requests_total
refapp: []
sf-notifier:
  - sf_auth_ok
  - sf_error_count_created
  - sf_error_count_total
  - sf_request_count_created
  - sf_request_count_total
telegraf-docker-swarm:
  - docker_n_containers
  - docker_n_containers_paused
  - docker_n_containers_running
  - docker_n_containers_stopped
  - docker_swarm_node_ready
  - docker_swarm_tasks_desired
  - docker_swarm_tasks_running
  - internal_agent_gather_errors
telemeter-client:
  - federate_errors
  - federate_filtered_samples
  - federate_samples
telemeter-server:
  - telemeter_cleanups_total
  - telemeter_partitions
  - telemeter_samples_total
tf-cassandra-jmx-exporter:
  - cassandra_cache_entries
  - cassandra_cache_estimated_size_bytes
  - cassandra_cache_hits_total
  - cassandra_cache_requests_total
  - cassandra_client_authentication_failures_total
  - cassandra_client_native_connections
  - cassandra_client_request_failures_total
  - cassandra_client_request_latency_seconds_count
  - cassandra_client_request_latency_seconds_sum
  - cassandra_client_request_timeouts_total
  - cassandra_client_request_unavailable_exceptions_total
  - cassandra_client_request_view_write_latency_seconds
  - cassandra_commit_log_pending_tasks
  - cassandra_compaction_bytes_compacted_total
  - cassandra_compaction_completed_total
  - cassandra_dropped_messages_total
  - cassandra_endpoint_connection_timeouts_total
  - cassandra_storage_exceptions_total
  - cassandra_storage_hints_total
  - cassandra_storage_load_bytes
  - cassandra_table_estimated_pending_compactions
  - cassandra_table_repaired_ratio
  - cassandra_table_sstables_per_read_count
  - cassandra_table_tombstones_scanned
  - cassandra_thread_pool_active_tasks
  - cassandra_thread_pool_blocked_tasks
tf-control:
  - tf_controller_sessions
  - tf_controller_up
tf-kafka-jmx:
  - jmx_exporter_build_info
  - kafka_controller_controllerstats_count
  - kafka_controller_controllerstats_oneminuterate
  - kafka_controller_kafkacontroller_value
  - kafka_log_log_value
  - kafka_network_processor_value
  - kafka_network_requestmetrics_99thpercentile
  - kafka_network_requestmetrics_mean
  - kafka_network_requestmetrics_oneminuterate
  - kafka_network_socketserver_value
  - kafka_server_brokertopicmetrics_count
  - kafka_server_brokertopicmetrics_oneminuterate
  - kafka_server_delayedoperationpurgatory_value
  - kafka_server_kafkarequesthandlerpool_oneminuterate
  - kafka_server_replicamanager_oneminuterate
  - kafka_server_replicamanager_value
tf-operator:
  - tf_operator_info # Since MOSK 23.3
tf-redis:
  - redis_commands_duration_seconds_total
  - redis_commands_processed_total
  - redis_commands_total
  - redis_connected_clients
  - redis_connected_slaves
  - redis_db_keys
  - redis_db_keys_expiring
  - redis_evicted_keys_total
  - redis_expired_keys_total
  - redis_exporter_build_info
  - redis_instance_info
  - redis_keyspace_hits_total
  - redis_keyspace_misses_total
  - redis_memory_max_bytes
  - redis_memory_used_bytes
  - redis_net_input_bytes_total
  - redis_net_output_bytes_total
  - redis_rejected_connections_total
  - redis_slave_info
  - redis_up
  - redis_uptime_in_seconds
tf-vrouter:
  - tf_vrouter_ds_discard
  - tf_vrouter_ds_flow_action_drop
  - tf_vrouter_ds_flow_queue_limit_exceeded
  - tf_vrouter_ds_flow_table_full
  - tf_vrouter_ds_frag_err
  - tf_vrouter_ds_invalid_if
  - tf_vrouter_ds_invalid_label
  - tf_vrouter_ds_invalid_nh
  - tf_vrouter_flow_active
  - tf_vrouter_flow_aged
  - tf_vrouter_flow_created
  - tf_vrouter_lls_session_info
  - tf_vrouter_up
  - tf_vrouter_xmpp_connection_state
tf-zookeeper:
  - approximate_data_size
  - bytes_received_count
  - commit_count
  - connection_drop_count
  - connection_rejected
  - connection_request_count
  - dead_watchers_cleaner_latency_sum
  - dead_watchers_cleared
  - dead_watchers_queued
  - digest_mismatches_count
  - election_time_sum
  - ephemerals_count
  - follower_sync_time_count
  - follower_sync_time_sum
  - fsynctime_sum
  - global_sessions
  - jvm_classes_loaded
  - jvm_gc_collection_seconds_sum
  - jvm_info
  - jvm_memory_pool_bytes_used
  - jvm_threads_current
  - jvm_threads_deadlocked
  - jvm_threads_state
  - leader_uptime
  - learner_commit_received_count
  - learner_proposal_received_count
  - learners
  - local_sessions
  - max_file_descriptor_count
  - node_changed_watch_count_sum
  - node_children_watch_count_sum
  - node_created_watch_count_sum
  - node_deleted_watch_count_sum
  - num_alive_connections
  - om_commit_process_time_ms_sum
  - om_proposal_process_time_ms_sum
  - open_file_descriptor_count
  - outstanding_requests
  - packets_received
  - packets_sent
  - pending_syncs
  - proposal_count
  - quorum_size
  - response_packet_cache_hits
  - response_packet_cache_misses
  - response_packet_get_children_cache_hits
  - response_packet_get_children_cache_misses
  - revalidate_count
  - snapshottime_sum
  - stale_sessions_expired
  - synced_followers
  - synced_non_voting_followers
  - synced_observers
  - unrecoverable_error_count
  - uptime
  - watch_count
  - znode_count
ucp-kv: []

Note

The following Prometheus metrics are removed from the list of white-listed scrape jobs in Container Cloud 2.25.0 (Cluster releases 17.0.0, 16.0.0, 14.1.0):

  • The prometheus-kube-state-metrics group:

    • kube_deployment_spec_paused

    • kube_deployment_spec_strategy_rollingupdate_max_unavailable

    • kube_deployment_status_condition

    • kube_deployment_status_replicas_ready

  • The prometheus-coredns job from the go-collector-metrics and process-collector-metrics groups

You can add necessary metrics that are dropped to this white list as described below. It is also possible to disable the filtering feature. However, Mirantis does not recommend disabling the feature to prevent direct impact on the Prometheus index size, which affects query speed. For clusters with extended retention period, performance degradation will be the most noticeable.

Add dropped metrics to the white list

You can expand the default white list of Prometheus metrics using the prometheusServer.metricsFiltering.extraMetricsInclude parameter to enable metrics that are dropped by default. For the parameter description, see Prometheus metrics filtering. For configuration steps, see StackLight configuration procedure.

Example configuration:

prometheusServer:
  metricsFiltering:
    enabled: true
    extraMetricsInclude:
      cadvisor:
        - container_memory_failcnt
        - container_network_transmit_errors_total
      calico:
        - felix_route_table_per_iface_sync_seconds_sum
        - felix_bpf_dataplane_endpoints
      _group-go-collector-metrics:
        - go_gc_heap_goal_bytes
        - go_gc_heap_objects_objects

Disable metrics filtering

Mirantis does not recommend disabling metrics filtering to prevent direct impact on the Prometheus index size, which affects query speed. In clusters with an extended retention period, performance degradation will be the most noticeable. Therefore, the best option is to keep the feature enabled and add the required dropped metrics to the white list as described in Add dropped metrics to the white list.

If disabling of metrics filtering is absolutely necessary, set the prometheusServer.metricsFiltering.enabled parameter to false:

prometheusServer:
  metricsFiltering:
    enabled: false

For configuration steps, see StackLight configuration procedure.