Troubleshoot a VM network outage

Troubleshoot a VM network outage

This section explains how to troubleshoot a VM network outage.

To troubleshoot a VM network outage:

  1. Verify the disk space, CPU load, and RAM on the VM in question.

  2. Verify that the VM is enabled and has all interfaces up. You can do this using the Horizon Dashboard in the Admin > Instances tab or using CLI:

    # Get VM status
    nova list --all-tenants | grep <vmname>
    
    # Get hypervisor name
    nova show <vmname>
    
  3. Ping the default gateway using ip r and ip a* and other VMs on the same network to identify whether it is a global, VM-related, or hypervisor-related problem.

    Each VM has a virtual gateway usually at the first address. Pinging of a virtual gateway means that network connection between the VM and the hypervisor vRouter is not broken. This can show you a broken network connection inside the VM.

    If you can ping the default gateway, but not anything outside or you cannot ping other VMs inside the virtual network, it can be a hypervisor-related issue. If it is the case, follow the steps below:

    1. Log in to Horizon.

    2. Identify the OpenContrail controller node that runs the VM hypervisor.

    3. Log in to that OpenContrail controller node.

    4. Verify the status of the supervisor-vrouter service using the contrail-status command.

    5. If the supervisor-vrouter status is inactive:

      1. Restart supervisor-vrouter.

      2. Inspect the /var/log/contrail/contrail-vrouter* logs.

  4. Verify that the VM unavailability is not caused by the firewall rules set in Security Groups or Network Policies in Horizon. Verify the security groups associated with the VM and the network policies attached to the virtual network.

  5. Verify the peering status in the OpenContrail web UI navigating to Monitor > Control Nodes > Choose of them > Peers. The status should be Established, in sync. If it is not the case, select from the following options:

    • Verify the availability of network devices. If the network devices are in the expected status (active by default), you can restart all OpenContrail controller nodes in sequence. Though, verify that at least two OpenContrail controller nodes are up and in a correct state. You should never restart all nodes at once.

    • Restart the supervisor-control service and verify whether it is in the active status using the contrail-status command.

  6. Verify the OpenContrail status on the OpenContrail node that runs the hypervisor with the VM.

    If the contrail-status output contains some services not in the active status, restart the supervisor-vrouter process and verify the status again.

  7. To identify the problem with the vRouter on the hypervisor or a VM inside the network setup, ping another VM on the same hypervisor or link-Local.

    You can also ping the link-local IP address. To get this address, select from the following options:

    • Using the OpenContrail web UI, find the VM details on the Contrail Dashboard -> vRouter with VM page.

    • Using CLI, run the following command on the OpenStack compute node in question:

      netstat -r
      

      Example of system response:

      Kernel IP routing table
      Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
      default         10.0.106.1      0.0.0.0         UG        0 0          0 vhost0
      localnet        *               255.255.255.0   U         0 0          0 vhost0
      169.254.0.3     *               255.255.255.255 UH        0 0          0 vhost0
      169.254.0.4     *               255.255.255.255 UH        0 0          0 vhost0
      169.254.0.5     *               255.255.255.255 UH        0 0          0 vhost0
      
    • Using ssh to the VM from the hypervisor, run the following command:

      vif --list
      

      Example of system response:

      Vrouter Interface Table
      
      Flags: P=Policy, X=Cross Connect, S=Service Chain, Mr=Receive Mirror
             Mt=Transmit Mirror, Tc=Transmit Checksum Offload, L3=Layer 3, L2=Layer 2
             D=DHCP, Vp=Vhost Physical, Pr=Promiscuous, Vnt=Native Vlan Tagged
             Mnp=No MAC Proxy
      
      vif0/0      OS: bond0.3034 (Speed 20000, Duplex 1)
                  Type:Physical HWaddr:a4:1f:72:0a:93:8c IPaddr:0
                  Vrf:0 Flags:TcL3L2Vp MTU:1514 Ref:22
                  RX packets:9294622  bytes:1402159738 errors:0
                  TX packets:14035541  bytes:10121866276 errors:0
      
      vif0/1      OS: vhost0
                  Type:Host HWaddr:a4:1f:72:0a:93:8c IPaddr:0
                  Vrf:0 Flags:L3L2 MTU:1514 Ref:3
                  RX packets:9649123  bytes:9390647810 errors:0
                  TX packets:5671040  bytes:993417128 errors:0
      
  8. Inspect the OpenContrail log files in /var/log/contrail/* and the /var/log/contrail/contrail-vrouter*.log file to debug vRouter, in particular.

  9. If the problem is related to the networking inside the VM, verify the interface configuration, the DNS resolv.conf file, routing table, and so on.