OpenContrail vRouter

OpenContrail vRouter

This section describes the alerts for the OpenContrail vRouter alerts.


ContrailBGPSessionsNoEstablished

Severity

Warning

Summary

There are no established OpenContrail BGP sessions on the {{ $labels.host }} node for 2 minutes.

Raise condition

max(contrail_bgp_session_count) by (host) == 0

Description

Raises when no BGP sessions in the established state (FSM) exist on a node. The host label in the raised alert contains the host name of the affected node.

Troubleshooting

  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.

  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.

  3. In Introspect, inspect the introspection data filtered by request type. Select the bgp_peer module.

  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.

Tuning

Not required

ContrailBGPSessionsNoActive

Severity

Warning

Summary

There are no active OpenContrail BGP sessions on the {{ $labels.host }} node for 2 minutes.

Raise condition

max(contrail_bgp_session_up_count) by (host) == 0

Description

Raises when no BGP sessions in the active state (FSM) exist on a node. The host label in the raised alert contains the host name of the affected node.

Troubleshooting

  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.

  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.

  3. In Introspect, inspect the introspection data filtered by request type. Select the bgp_peer module.

  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.

Tuning

Not required

ContrailBGPSessionsDown

Severity

Warning

Summary

The OpenContrail BGP sessions on the {{ $labels.host }} node are down for 2 minutes.

Raise condition

min(contrail_bgp_session_down_count) by (host) > 0

Description

Raises when a node has BGP sessions in the down state. The host label in the raised alert contains the host name of the affected node.

Troubleshooting

  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.

  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.

  3. In Introspect, inspect the introspection data filtered by request type. Select the bgp_peer module.

  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.

Tuning

Not required

ContrailXMPPSessionsMissingEstablished

Severity

Warning

Summary

The OpenContrail XMPP sessions in the established state are missing on the compute cluster for 2 minutes.

Raise condition

count(contrail_vrouter_xmpp) * 2 - sum(contrail_xmpp_session_up_count) > 0

Description

Raises when the compute cluster has no OpenContrail XMPP sessions in the established state (FSM). No assumption is made for equal sessions distribution across the cluster. The vRouter can have 0 sessions in the working state. However, a properly operating compute cluster must have at least 2 connections per vRouter.

Troubleshooting

  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.

  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.

  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.

  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.

Tuning

Not required

ContrailXMPPSessionsMissing

Severity

Warning

Summary

The OpenContrail XMPP sessions are missing on the compute cluster for 2 minutes.

Raise condition

count(contrail_vrouter_xmpp) * 2 - sum(contrail_xmpp_session_count) > 0

Description

Raises when the compute cluster has no OpenContrail XMPP sessions in any state. The conditions are the same as for the ContrailXMPPSessionsMissingEstablished alert.

Troubleshooting

  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.

  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.

  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.

  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.

Tuning

Not required

ContrailXMPPSessionsDown

Severity

Warning

Summary

The {{ $labels.host }} node contains the OpenContrail XMPP sessions in the down state for 2 minutes.

Raise condition

min(contrail_xmpp_session_down_count) by (host) > 0

Description

Raises when a node has OpenConrail XMPP sessions in the DOWN state. The host label in the raised alert contains the host name of the affected node.

Troubleshooting

  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.

  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.

  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.

  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.

Tuning

Not required

ContrailXMPPSessionsTooHigh

Severity

Warning

Summary

There are more than 500 open OpenContrail XMPP sessions on the {{ $labels.host }} node for 2 minutes.

Raise condition

min(contrail_xmpp_session_count) by (host) >= 500

Description

Raises when the number of OpenContrail XMPP sessions reaches 500 on one node. The host label in the raised alert contains the host name of the affected node.

Warning

For production environments, configure the alert after deployment.

Troubleshooting

  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.

  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.

  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.

  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.

Tuning

For example, to change the threshold to 1000 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailXMPPSessionsTooHigh:
              if: >-
                min(contrail_xmpp_session_count) by (host) >= 1000
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailXMPPSessionsChangesTooHigh

Severity

Warning

Summary

The OpenContrail XMPP sessions on the {{ $labels.host }} node have changed more than 100 times.

Raise condition

abs(delta(contrail_xmpp_session_count[2m])) >= 100

Description

Raises when the number of OpenContrail XMPP session changes reaches 100 on one node, calculated as an absolute difference of the first and last point in a two-minute time frame. The host label in the raised alert contains the host name of the affected node.

Warning

For production environments, configure the alert after deployment.

Troubleshooting

  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.

  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.

  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.

  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.

Tuning

For example, to change the threshold to >= 250 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailXMPPSessionsChangesTooHigh:
              if: >-
                abs(delta(contrail_xmpp_session_count[2m])) >= 250
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailVrouterXMPPSessionsZero

Severity

Warning

Summary

There are no OpenContrail vRouter XMPP sessions on the {{ $labels.host }} node for 2 minutes.

Raise condition

min(contrail_vrouter_xmpp) by (host) == 0

Description

Raises when a node has no OpenContrail vRouter XMPP sessions. The host label in the raised alert contains the host name of the affected node.

Troubleshooting

  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.

  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.

  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.

  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.

Tuning

Not required

ContrailVrouterXMPPSessionsTooHigh

Severity

Warning

Summary

There are more than 10 open OpenContrail vRouter XMPP sessions on the {{ $labels.host }} node for 2 minutes.

Raise condition

min(contrail_vrouter_xmpp) by (host) >= 10

Description

Raises when the number of OpenContrail vrouter XMPP sessions reaches 10 on one node. The host label in the raised alert contains the host name of the affected node.

Warning

For production environments, configure the alert after deployment.

Troubleshooting

  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.

  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.

  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.

  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.

Tuning

For example, to change the threshold to 20 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailVrouterXMPPSessionsTooHigh:
              if: >-
                min(contrail_vrouter_xmpp) by (host) >= 20
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailVrouterXMPPSessionsChangesTooHigh

Severity

Warning

Summary

The OpenContrail vRouter XMPP sessions on the {{$labels.host }} node have changed more than 5 times.

Raise condition

abs(delta(contrail_vrouter_xmpp[2m])) >= 5

Description

Raises when the number of OpenContrail vRouter XMPP session changes reaches 5 on one node, calculated as an absolute difference of the first and last points in a two-minute time frame. The host label in the raised alert contains the host name of the affected node.

Warning

For production environments, configure the alert after deployment.

Troubleshooting

  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.

  2. Navigate to Monitor > DNS Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.

  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.

  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.

Tuning

For example, to change the threshold to 10 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailVrouterXMPPSessionsChangesTooHigh:
              if: >-
                abs(delta(contrail_vrouter_xmpp[2m])) >= 10
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailVrouterDNSXMPPSessionsZero

Severity

Warning

Summary

There are no OpenContrail vRouter DNS-XMPP sessions on the {{ $labels.host }} node for 2 minutes.

Raise condition

min(contrail_vrouter_dns_xmpp) by (host) == 0

Description

Raises when one node has no OpenContrail DNS-XMPP sessions. The host label in the raised alert contains the host name of the affected node.

Troubleshooting

  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.

  2. Navigate to Monitor > DNS Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.

  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.

  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.

Tuning

Not required

ContrailVrouterDNSXMPPSessionsTooHigh

Severity

Warning

Summary

More than 10 OpenContrail vRouter DNS-XMPP sessions are open on the {{ $labels.host }} node for 2 minutes.

Raise condition

min(contrail_vrouter_dns_xmpp) by (host) >= 10

Description

Raises when the number of OpenContrail DNS-XMPP sessions reaches 10 on one node. The host label in the raised alert contains the host name of the affected node.

Warning

For production environments, configure the alert after deployment.

Troubleshooting

  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.

  2. Navigate to Monitor > DNS Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.

  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.

  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.

Tuning

For example, to change the threshold to 20 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailVrouterDNSXMPPSessionsTooHigh:
              if: >-
                min(contrail_vrouter_dns_xmpp) by (host) >= 20
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailVrouterDNSXMPPSessionsChangesTooHigh

Severity

Warning

Summary

The OpenContrail vRouter DNS-XMPP sessions on the {{ $labels.host }} node have changed more than 5 times.

Raise condition

abs(delta(contrail_vrouter_dns_xmpp[2m])) >= 5

Description

Raises when the number of OpenContrail DNS-XMPP session changes reaches 5 on one node, calculated as an absolute difference of the first and last points in a two-minute time frame.

Warning

For production environments, configure the alert after deployment.

Troubleshooting

  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.

  2. Navigate to Monitor > DNS Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.

  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.

  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.

Tuning

For example, to change the threshold to 10 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailVrouterDNSXMPPSessionsChangesTooHigh:
              if: >-
                abs(delta(contrail_vrouter_dns_xmpp[2m])) >= 10
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailVrouterLLSSessionsTooHigh

Severity

Warning

Summary

There are more than 10 open OpenContrail vRouter LLS sessions on the {{ $labels.host }} node for 2 minutes.

Raise condition

min(contrail_vrouter_lls) by (host) >= 10

Description

Raises when the number of OpenContrail vRouter LocalLinkService sessions reaches 10 on one node.

Warning

For production environments, configure the alert after deployment.

Tuning

For example, to change the threshold to 20 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailVrouterLLSSessionsTooHigh:
              if: >-
                min(contrail_vrouter_lls) by (host) >= 20
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailVrouterLLSSessionsChangesTooHigh

Severity

Warning

Summary

The OpenContrail vRouter LLS sessions on the {{$labels.host }} node have changed more than 5 times.

Raise condition

abs(delta(contrail_vrouter_lls[2m])) >= 5

Description

Raises when the number of OpenContrail vRouter LLS session changes reaches 5 on one node, calculated as an absolute difference of the first and last points in a two-minute time frame.

Warning

For production environments, configure the alert after deployment.

Tuning

For example, to change the threshold to 10 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailVrouterLLSSessionsChangesTooHigh:
              if: >-
                abs(delta(contrail_vrouter_lls[2m])) >= 10
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailGlobalVrouterConfigCheckDisabled

Available since 2019.2.4

Severity

Critical

Summary

The OpenContrail global vRouter configuration check is disabled.

Raise condition

absent(contrail_global_vrouter_config_exit_code) == 1

Description

Raises when Prometheus has no metric with the contrail_global_vrouter_config_exit_code name.

Troubleshooting

Inspect the Telegraf logs on the ntw nodes.

Tuning

Not required

ContrailGlobalVrouterConfigCheckFailed

Available since 2019.2.4

Severity

Critical

Summary

The OpenContrail global vRouter configuration check failed on the {{ $labels.host }} node.

Raise condition

contrail_global_vrouter_config_exit_code != 0

Description

Raises when the OpenContrail Virtual Network Controller (VNC) API returns 0 or more than 1 global-vrouter-configs.

Troubleshooting

Inspect the output of the contrail-status command on any ntw node.

Tuning

Not required