OpenContrail vRouter

OpenContrail vRouter

This section describes the alerts for the OpenContrail vRouter alerts.


ContrailBGPSessionsNoEstablished

Severity Warning
Summary There are no established OpenContrail BGP sessions on the {{ $labels.host }} node for 2 minutes.
Raise condition max(contrail_bgp_session_count) by (host) == 0
Description Raises when no BGP sessions in the established state (FSM) exist on a node. The host label in the raised alert contains the host name of the affected node.
Troubleshooting
  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.
  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.
  3. In Introspect, inspect the introspection data filtered by request type. Select the bgp_peer module.
  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.
Tuning Not required

ContrailBGPSessionsNoActive

Severity Warning
Summary There are no active OpenContrail BGP sessions on the {{ $labels.host }} node for 2 minutes.
Raise condition max(contrail_bgp_session_up_count) by (host) == 0
Description Raises when no BGP sessions in the active state (FSM) exist on a node. The host label in the raised alert contains the host name of the affected node.
Troubleshooting
  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.
  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.
  3. In Introspect, inspect the introspection data filtered by request type. Select the bgp_peer module.
  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.
Tuning Not required

ContrailBGPSessionsDown

Severity Warning
Summary The OpenContrail BGP sessions on the {{ $labels.host }} node are down for 2 minutes.
Raise condition min(contrail_bgp_session_down_count) by (host) > 0
Description Raises when a node has BGP sessions in the down state. The host label in the raised alert contains the host name of the affected node.
Troubleshooting
  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.
  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.
  3. In Introspect, inspect the introspection data filtered by request type. Select the bgp_peer module.
  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.
Tuning Not required

ContrailXMPPSessionsMissingEstablished

Severity Warning
Summary The OpenContrail XMPP sessions in the established state are missing on the compute cluster for 2 minutes.
Raise condition count(contrail_vrouter_xmpp) * 2 - sum(contrail_xmpp_session_up_count) > 0
Description Raises when the compute cluster has no OpenContrail XMPP sessions in the established state (FSM). No assumption is made for equal sessions distribution across the cluster. The vRouter can have 0 sessions in the working state. However, a properly operating compute cluster must have at least 2 connections per vRouter.
Troubleshooting
  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.
  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.
  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.
  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.
Tuning Not required

ContrailXMPPSessionsMissing

Severity Warning
Summary The OpenContrail XMPP sessions are missing on the compute cluster for 2 minutes.
Raise condition count(contrail_vrouter_xmpp) * 2 - sum(contrail_xmpp_session_count) > 0
Description Raises when the compute cluster has no OpenContrail XMPP sessions in any state. The conditions are the same as for the ContrailXMPPSessionsMissingEstablished alert.
Troubleshooting
  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.
  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.
  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.
  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.
Tuning Not required

ContrailXMPPSessionsDown

Severity Warning
Summary The {{ $labels.host }} node contains the OpenContrail XMPP sessions in the down state for 2 minutes.
Raise condition min(contrail_xmpp_session_down_count) by (host) > 0
Description Raises when a node has OpenConrail XMPP sessions in the DOWN state. The host label in the raised alert contains the host name of the affected node.
Troubleshooting
  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.
  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.
  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.
  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.
Tuning Not required

ContrailXMPPSessionsTooHigh

Severity Warning
Summary There are more than 500 open OpenContrail XMPP sessions on the {{ $labels.host }} node for 2 minutes.
Raise condition min(contrail_xmpp_session_count) by (host) >= 500
Description

Raises when the number of OpenContrail XMPP sessions reaches 500 on one node. The host label in the raised alert contains the host name of the affected node.

Warning

For production environments, configure the alert after deployment.

Troubleshooting
  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.
  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.
  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.
  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.
Tuning

For example, to change the threshold to 1000 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailXMPPSessionsTooHigh:
              if: >-
                min(contrail_xmpp_session_count) by (host) >= 1000
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailXMPPSessionsChangesTooHigh

Severity Warning
Summary The OpenContrail XMPP sessions on the {{ $labels.host }} node have changed more than 100 times.
Raise condition abs(delta(contrail_xmpp_session_count[2m])) >= 100
Description

Raises when the number of OpenContrail XMPP session changes reaches 100 on one node, calculated as an absolute difference of the first and last point in a two-minute time frame. The host label in the raised alert contains the host name of the affected node.

Warning

For production environments, configure the alert after deployment.

Troubleshooting
  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.
  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.
  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.
  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.
Tuning

For example, to change the threshold to >= 250 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailXMPPSessionsChangesTooHigh:
              if: >-
                abs(delta(contrail_xmpp_session_count[2m])) >= 250
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailVrouterXMPPSessionsZero

Severity Warning
Summary There are no OpenContrail vRouter XMPP sessions on the {{ $labels.host }} node for 2 minutes.
Raise condition min(contrail_vrouter_xmpp) by (host) == 0
Description Raises when a node has no OpenContrail vRouter XMPP sessions. The host label in the raised alert contains the host name of the affected node.
Troubleshooting
  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.
  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.
  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.
  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.
Tuning Not required

ContrailVrouterXMPPSessionsTooHigh

Severity Warning
Summary There are more than 10 open OpenContrail vRouter XMPP sessions on the {{ $labels.host }} node for 2 minutes.
Raise condition min(contrail_vrouter_xmpp) by (host) >= 10
Description

Raises when the number of OpenContrail vrouter XMPP sessions reaches 10 on one node. The host label in the raised alert contains the host name of the affected node.

Warning

For production environments, configure the alert after deployment.

Troubleshooting
  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.
  2. Navigate to Monitor > Infrastructure > Control Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.
  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.
  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.
Tuning

For example, to change the threshold to 20 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailVrouterXMPPSessionsTooHigh:
              if: >-
                min(contrail_vrouter_xmpp) by (host) >= 20
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailVrouterXMPPSessionsChangesTooHigh

Severity Warning
Summary The OpenContrail vRouter XMPP sessions on the {{$labels.host }} node have changed more than 5 times.
Raise condition abs(delta(contrail_vrouter_xmpp[2m])) >= 5
Description

Raises when the number of OpenContrail vRouter XMPP session changes reaches 5 on one node, calculated as an absolute difference of the first and last points in a two-minute time frame. The host label in the raised alert contains the host name of the affected node.

Warning

For production environments, configure the alert after deployment.

Troubleshooting
  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.
  2. Navigate to Monitor > DNS Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.
  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.
  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.
Tuning

For example, to change the threshold to 10 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailVrouterXMPPSessionsChangesTooHigh:
              if: >-
                abs(delta(contrail_vrouter_xmpp[2m])) >= 10
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailVrouterDNSXMPPSessionsZero

Severity Warning
Summary There are no OpenContrail vRouter DNS-XMPP sessions on the {{ $labels.host }} node for 2 minutes.
Raise condition min(contrail_vrouter_dns_xmpp) by (host) == 0
Description Raises when one node has no OpenContrail DNS-XMPP sessions. The host label in the raised alert contains the host name of the affected node.
Troubleshooting
  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.
  2. Navigate to Monitor > DNS Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.
  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.
  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.
Tuning Not required

ContrailVrouterDNSXMPPSessionsTooHigh

Severity Warning
Summary More than 10 OpenContrail vRouter DNS-XMPP sessions are open on the {{ $labels.host }} node for 2 minutes.
Raise condition min(contrail_vrouter_dns_xmpp) by (host) >= 10
Description

Raises when the number of OpenContrail DNS-XMPP sessions reaches 10 on one node. The host label in the raised alert contains the host name of the affected node.

Warning

For production environments, configure the alert after deployment.

Troubleshooting
  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.
  2. Navigate to Monitor > DNS Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.
  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.
  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.
Tuning

For example, to change the threshold to 20 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailVrouterDNSXMPPSessionsTooHigh:
              if: >-
                min(contrail_vrouter_dns_xmpp) by (host) >= 20
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailVrouterDNSXMPPSessionsChangesTooHigh

Severity Warning
Summary The OpenContrail vRouter DNS-XMPP sessions on the {{ $labels.host }} node have changed more than 5 times.
Raise condition abs(delta(contrail_vrouter_dns_xmpp[2m])) >= 5
Description

Raises when the number of OpenContrail DNS-XMPP session changes reaches 5 on one node, calculated as an absolute difference of the first and last points in a two-minute time frame.

Warning

For production environments, configure the alert after deployment.

Troubleshooting
  1. Log in to the OpenContrail web UI using the credentials from /etc/contrail/contrail-webui-userauth.js on the network nodes.
  2. Navigate to Monitor > DNS Nodes and select the affected node to inspect the analytics data of the OpenContrail controller nodes.
  3. In Introspect, inspect the introspection data filtered by request type. Select the xmpp_server module.
  4. Verify the BGP routers configuration in Configure > Infrastructure > BGP Routers.
Tuning

For example, to change the threshold to 10 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailVrouterDNSXMPPSessionsChangesTooHigh:
              if: >-
                abs(delta(contrail_vrouter_dns_xmpp[2m])) >= 10
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailVrouterLLSSessionsTooHigh

Severity Warning
Summary There are more than 10 open OpenContrail vRouter LLS sessions on the {{ $labels.host }} node for 2 minutes.
Raise condition min(contrail_vrouter_lls) by (host) >= 10
Description

Raises when the number of OpenContrail vRouter LocalLinkService sessions reaches 10 on one node.

Warning

For production environments, configure the alert after deployment.

Tuning

For example, to change the threshold to 20 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailVrouterLLSSessionsTooHigh:
              if: >-
                min(contrail_vrouter_lls) by (host) >= 20
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailVrouterLLSSessionsChangesTooHigh

Severity Warning
Summary The OpenContrail vRouter LLS sessions on the {{$labels.host }} node have changed more than 5 times.
Raise condition abs(delta(contrail_vrouter_lls[2m])) >= 5
Description

Raises when the number of OpenContrail vRouter LLS session changes reaches 5 on one node, calculated as an absolute difference of the first and last points in a two-minute time frame.

Warning

For production environments, configure the alert after deployment.

Tuning

For example, to change the threshold to 10 sessions:

  1. On the cluster level of the Reclass model, create a common file for all alert customizations. Skip this step to use an existing defined file.

    1. Create a file for alert customizations:

      touch cluster/<cluster_name>/stacklight/custom/alerts.yml
      
    2. Define the new file in cluster/<cluster_name>/stacklight/server.yml:

      classes:
      - cluster.<cluster_name>.stacklight.custom.alerts
      ...
      
  2. In the defined alert customizations file, modify the alert threshold by overriding the if parameter:

    parameters:
      prometheus:
        server:
          alert:
            ContrailVrouterLLSSessionsChangesTooHigh:
              if: >-
                abs(delta(contrail_vrouter_lls[2m])) >= 10
    
  3. From the Salt Master node, apply the changes:

    salt 'I@prometheus:server' state.sls prometheus.server
    
  4. Verify the updated alert definition in the Prometheus web UI.

ContrailGlobalVrouterConfigCheckDisabled

Available since 2019.2.4

Severity Critical
Summary The OpenContrail global vRouter configuration check is disabled.
Raise condition absent(contrail_global_vrouter_config_exit_code) == 1
Description Raises when Prometheus has no metric with the contrail_global_vrouter_config_exit_code name.
Troubleshooting Inspect the Telegraf logs on the ntw nodes.
Tuning Not required

ContrailGlobalVrouterConfigCheckFailed

Available since 2019.2.4

Severity Critical
Summary The OpenContrail global vRouter configuration check failed on the {{ $labels.host }} node.
Raise condition contrail_global_vrouter_config_exit_code != 0
Description Raises when the OpenContrail Virtual Network Controller (VNC) API returns 0 or more than 1 global-vrouter-configs.
Troubleshooting Inspect the output of the contrail-status command on any ntw node.
Tuning Not required