Use a load balancer

With a load balancer, users can access MSR using a single domain name.

Once you have achieved high availability by joining multiple MSR replica nodes, you can configure a load balancer to balance user requests across those replicas. The load balancer detects when a replica fails and immediately stops forwarding requests to it, thus ensuring that the failure goes unnoticed by users.

MSR does not provide a load balancing service. You must use either an on-premises or cloud-based load balancer to balance requests across multiple MSR replicas.

Important

Additional steps are needed to use the same load balancer with both MSR and MKE. For more information, refer to Configure a load balancer in the MKE documentation.

Verify cluster health

MSR exposes several endpoints that you can use to assess the health of an MSR replica:

/_ping

Verifies whether the MSR replica is healthy. This is useful for load balancing and other automated health check tasks. This endpoint is unauthenticated.

/nginx_status

Returns the number of connections handled by the MSR NGINX front end.

/api/v0/meta/cluster_status

Returns detailed information about all MSR replicas.

You can use the unauthenticated /_ping endpoint on each MSR replica, to check the health status of the replica and whether it should remain in the load balancing pool or not.

The /_ping endpoint returns a JSON object for the replica being queried that takes the following form:

{
  "Error": "<error-message>",
  "Healthy": true
}

A response of "Healthy": true indicates that the replica is suitable for taking requests. It also signifies that the HTTP status code is 200.

An unhealthy replica will return 503 as the status code and populate "Error" with more details on any of the following services:

  • Storage container (MSR)

  • Authorization (Garant)

  • Metadata persistence (RethinkDB)

  • Content trust (Notary)

Note that the purpose of the /_ping endpoint is to check the health of a single replica. To obtain the health of every replica in a cluster, you must individually query each replica.

Load balance MSR

  1. Configure your load balancer for MSR, using the pertinent example below:

    user  nginx;
       worker_processes  1;
    
       error_log  /var/log/nginx/error.log warn;
       pid        /var/run/nginx.pid;
    
       events {
          worker_connections  1024;
       }
    
       stream {
          upstream dtr_80 {
             server <MSR_REPLICA_1_IP>:80  max_fails=2 fail_timeout=30s;
             server <MSR_REPLICA_2_IP>:80  max_fails=2 fail_timeout=30s;
             server <MSR_REPLICA_N_IP>:80   max_fails=2 fail_timeout=30s;
          }
          upstream dtr_443 {
             server <MSR_REPLICA_1_IP>:443 max_fails=2 fail_timeout=30s;
             server <MSR_REPLICA_2_IP>:443 max_fails=2 fail_timeout=30s;
             server <MSR_REPLICA_N_IP>:443  max_fails=2 fail_timeout=30s;
          }
          server {
             listen 443;
             proxy_pass dtr_443;
          }
    
          server {
             listen 80;
             proxy_pass dtr_80;
          }
       }
    
    global
          log /dev/log    local0
          log /dev/log    local1 notice
    
       defaults
             mode    tcp
             option  dontlognull
             timeout connect 5s
             timeout client 50s
             timeout server 50s
             timeout tunnel 1h
             timeout client-fin 50s
       ### frontends
       # Optional HAProxy Stats Page accessible at http://<host-ip>:8181/haproxy?stats
       frontend dtr_stats
             mode http
             bind 0.0.0.0:8181
             default_backend dtr_stats
       frontend dtr_80
             mode tcp
             bind 0.0.0.0:80
             default_backend dtr_upstream_servers_80
       frontend dtr_443
             mode tcp
             bind 0.0.0.0:443
             default_backend dtr_upstream_servers_443
       ### backends
       backend dtr_stats
             mode http
             option httplog
             stats enable
             stats admin if TRUE
             stats refresh 5m
       backend dtr_upstream_servers_80
             mode tcp
             option httpchk GET /_ping HTTP/1.1\r\nHost:\ <MSR_FQDN>
             server node01 <MSR_REPLICA_1_IP>:80 check weight 100
             server node02 <MSR_REPLICA_2_IP>:80 check weight 100
             server node03 <MSR_REPLICA_N_IP>:80 check weight 100
       backend dtr_upstream_servers_443
             mode tcp
             option httpchk GET /_ping HTTP/1.1\r\nHost:\ <MSR_FQDN>
             server node01 <MSR_REPLICA_1_IP>:443 weight 100 check check-ssl verify none
             server node02 <MSR_REPLICA_2_IP>:443 weight 100 check check-ssl verify none
             server node03 <MSR_REPLICA_N_IP>:443 weight 100 check check-ssl verify none
    
    {
          "Subnets": [
             "subnet-XXXXXXXX",
             "subnet-YYYYYYYY",
             "subnet-ZZZZZZZZ"
          ],
          "CanonicalHostedZoneNameID": "XXXXXXXXXXX",
          "CanonicalHostedZoneName": "XXXXXXXXX.us-west-XXX.elb.amazonaws.com",
          "ListenerDescriptions": [
             {
                   "Listener": {
                      "InstancePort": 443,
                      "LoadBalancerPort": 443,
                      "Protocol": "TCP",
                      "InstanceProtocol": "TCP"
                   },
                   "PolicyNames": []
             }
          ],
          "HealthCheck": {
             "HealthyThreshold": 2,
             "Interval": 10,
             "Target": "HTTPS:443/_ping",
             "Timeout": 2,
             "UnhealthyThreshold": 4
          },
          "VPCId": "vpc-XXXXXX",
          "BackendServerDescriptions": [],
          "Instances": [
             {
                   "InstanceId": "i-XXXXXXXXX"
             },
             {
                   "InstanceId": "i-XXXXXXXXX"
             },
             {
                   "InstanceId": "i-XXXXXXXXX"
             }
          ],
          "DNSName": "XXXXXXXXXXXX.us-west-2.elb.amazonaws.com",
          "SecurityGroups": [
             "sg-XXXXXXXXX"
          ],
          "Policies": {
             "LBCookieStickinessPolicies": [],
             "AppCookieStickinessPolicies": [],
             "OtherPolicies": []
          },
          "LoadBalancerName": "ELB-MSR",
          "CreatedTime": "2017-02-13T21:40:15.400Z",
          "AvailabilityZones": [
             "us-west-2c",
             "us-west-2a",
             "us-west-2b"
          ],
          "Scheme": "internet-facing",
          "SourceSecurityGroup": {
             "OwnerAlias": "XXXXXXXXXXXX",
             "GroupName":  "XXXXXXXXXXXX"
          }
       }
    
  2. Deploy your load balancer:

    # Create the nginx.conf file, then
    # deploy the load balancer
    
    docker run --detach \
    --name dtr-lb \
    --restart=unless-stopped \
    --publish 80:80 \
    --publish 443:443 \
    --volume ${PWD}/nginx.conf:/etc/nginx/nginx.conf:ro \
    nginx:stable-alpine
    
    # Create the haproxy.cfg file, then
    # deploy the load balancer
    
    docker run --detach \
    --name dtr-lb \
    --publish 443:443 \
    --publish 80:80 \
    --publish 8181:8181 \
    --restart=unless-stopped \
    --volume ${PWD}/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:ro \
    haproxy:1.7-alpine haproxy -d -f /usr/local/etc/haproxy/haproxy.cfg
    

Configure your load balancer to:

  • Load balance TCP traffic on ports 80 and 443.

  • Not terminate HTTPS connections.

  • Not buffer requests.

  • Correctly forward the Host HTTP header.

  • Not include a timeout for idle connections, or set the timeout to more than 10 minutes.