Perform initial troubleshooting

Perform initial troubleshooting

This section describes basic troubleshooting steps for the OpenContrail-related services.

To perform initial troubleshooting:

  1. Verify the NTP peers on every node of your MCP cluster:

    ntpq -p
    

    Example of system response:

         remote           refid      st t when poll reach   delay   offset  jitter
    ==============================================================================
    +tik.cesnet.cz   195.113.144.238  2 u  728 1024  377    4.645   -0.199   0.545
    *netopyr.hanacke .GPS.            1 u 1604 1024  276   14.931   -0.021   0.373
    

    If at least one of peers has * before its name, time is synchronized. Otherwise, inspect the /etc/ntp.conf file .

    Example of an ntp.conf file

    # Associate to cloud NTP pool servers
    server ntp.cesnet.cz iburst
    server pool.ntp.org
    
    # Only allow read-only access from localhost
    restrict default noquery nopeer
    restrict 127.0.0.1
    restrict ::1
    
    # Location of drift file
    driftfile /var/lib/ntp/ntp.drift
    logfile /var/log/ntp.log
    
  2. Verify the disk space, Inode, RAM, and CPU usage on every OpenContrail node. The total amount of used resources in the output must be maximum 90%.

    • To verify the disk space:

      df -h
      

      Example of system response:

      Filesystem      Size  Used Avail Use% Mounted on
      udev            3.9G   12K  3.9G   1% /dev
      tmpfs           799M  380K  798M   1% /run
      /dev/vda1        48G  5.7G   41G  13% /
      none            4.0K     0  4.0K   0% /sys/fs/cgroup
      none            5.0M     0  5.0M   0% /run/lock
      none            3.9G   12K  3.9G   1% /run/shm
      none            100M     0  100M   0% /run/user
      
    • To verify the Inode usage:

      df -i
      

      Example of system response:

      Filesystem       Inodes   IUsed    IFree IUse% Mounted on
      udev            2032563     533  2032030    1% /dev
      tmpfs           2037690     781  2036909    1% /run
      /dev/sda1       6250496 1396006  4854490   23% /
      tmpfs           2037690     304  2037386    1% /dev/shm
      tmpfs           2037690       6  2037684    1% /run/lock
      tmpfs           2037690      18  2037672    1% /sys/fs/cgroup
      /dev/sda6      53821440  731583 53089857    2% /home
      cgmfs           2037690      14  2037676    1% /run/cgmanager/fs
      tmpfs           2037690      44  2037646    1% /run/user/1000
      
    • To verify RAM usage:

      free -h
      

      Example of system response:

                   total       used       free     shared    buffers     cached
      Mem:          7.8G       7.3G       501M       416K       239M       2.6G
      -/+ buffers/cache:       4.5G       3.3G
      Swap:           0B         0B         0B
      
    • To verify CPU usage:

      cat /proc/stat | grep cpu | awk \
      '{unit=100/($1+$2+$3+$4+$5+$6+$7+$8+$9+$10); print $1 "\tidle: "  $5*unit  "%"}'
      

      Example of system response:

      cpu     idle: 94.1113%
      cpu0    idle: 94.3852%
      cpu1    idle: 92.851%
      cpu2    idle: 94.0428%
      cpu3    idle: 94.1673%
      cpu4    idle: 94.2658%
      cpu5    idle: 94.3526%
      cpu6    idle: 94.4082%
      cpu7    idle: 94.4092%
      
  3. Verify MTU and the status of interfaces on all OpenContrail nodes:

    ip link
    

    Example of system response:

    1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
        link/ether ac:de:48:b0:2d:3e brd ff:ff:ff:ff:ff:ff
    3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
        link/ether ac:de:48:a8:7a:09 brd ff:ff:ff:ff:ff:ff
    
  4. Verify whether the current number of files opened by Linux kernel is not over-limited:

    cat /proc/sys/fs/file-nr
    

    Example of system response:

    17736    0    1609849