Configure BGP announcement for cluster API LB address

TechPreview Available since 2.24.4

When you create a bare metal managed cluster with the multi-rack topology, where Kubernetes masters are distributed across multiple racks without an L2 layer extension between them, you must configure BGP announcement of the cluster API load balancer address.

For clusters where Kubernetes masters are in the same rack or with an L2 layer extension between masters, you can configure either BGP or L2 (ARP) announcement of the cluster API load balancer address. The L2 (ARP) announcement is used by default and its configuration is covered in Create a cluster using web UI.

Caution

Create Rack and MultiRackCluster objects, which are described in the below procedure, before initiating the provisioning of master nodes to ensure that both BGP and netplan configurations are applied simultaneously during the provisioning process.

To enable the use of BGP announcement for the cluster API LB address:

  1. In the Cluster object, set the useBGPAnnouncement parameter to true:

    spec:
      providerSpec:
        value:
          useBGPAnnouncement: true
    
  2. Create the MultiRackCluster object that is mandatory when configuring BGP announcement for the cluster API LB address. This object enables you to set cluster-wide parameters for configuration of BGP announcement.

    In this scenario, the MultiRackCluster object must be bound to the corresponding Cluster object using the cluster.sigs.k8s.io/cluster-name label.

    Container Cloud uses the bird BGP daemon for announcement of the cluster API LB address. For this reason, set the corresponding bgpdConfigFileName and bgpdConfigFilePath parameters in the MultiRackCluster object, so that bird can locate the configuration file. For details, see the configuration example below.

    The bgpdConfigTemplate object contains the default configuration file template for the bird BGP daemon, which you can override in Rack objects.

    The defaultPeer parameter contains default parameters of the BGP connection from master nodes to infrastructure BGP peers, which you can override in Rack objects.

    Configuration example for MultiRackCluster
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: MultiRackCluster
    metadata:
      name: multirack-test-cluster
      namespace: managed-ns
      labels:
        cluster.sigs.k8s.io/cluster-name: test-cluster
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
    spec:
      bgpdConfigFileName: bird.conf
      bgpdConfigFilePath: /etc/bird
      bgpdConfigTemplate: |
        ...
      defaultPeer:
        localASN: 65101
        neighborASN: 65100
        neighborIP: ""
        password: deadbeef
    

    For the object description, see API Reference: MultiRackCluster resource.

    Note

    The kaas.mirantis.com/region label is removed from all Container Cloud objects in 2.26.0 (Cluster releases 17.1.0 and 16.1.0). Therefore, do not add the label starting these releases. On existing clusters updated to these releases, or if manually added, this label will be ignored by Container Cloud.

  3. Create the Rack object(s). This object is mandatory when configuring BGP announcement for the cluster API LB address and it allows you to configure BGP announcement parameters for each rack.

    In this scenario, Rack objects must be bound to Machine objects corresponding to master nodes of the cluster. Each Rack object describes the configuration for the bird BGP daemon used to announce the cluster API LB address from a particular master node or from several master nodes in the same rack.

    The Machine object can optionally define the rack-id node label that is not used for BGP announcement of the cluster API LB IP but can be used for MetalLB. This label is required for MetalLB node selectors when MetalLB is used to announce LB IP addresses on nodes that are distributed across multiple racks. In this scenario, the L2 (ARP) announcement mode cannot be used for MetalLB because master nodes are in different L2 segments. So, the BGP announcement mode must be used for MetalLB, and node selectors are required to properly configure BGP connections from each node. See Configure MetalLB for details.

    The L2Template object includes the lo interface configuration to set the IP address for the bird BGP daemon that will be advertised as the cluster API LB address. The {{ cluster_api_lb_ip }} function is used in npTemplate to obtain the cluster API LB address value.

    Configuration example for Rack
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: Rack
    metadata:
      name: rack-master-1
      namespace: managed-ns
      labels:
        cluster.sigs.k8s.io/cluster-name: test-cluster
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
    spec:
      bgpdConfigTemplate: |  # optional
        ...
      peeringMap:
        lcm-rack-control-1:
          peers:
          - neighborIP: 10.77.31.2  # "localASN" & "neighborASN" are taken from
          - neighborIP: 10.77.31.3  # "MultiRackCluster.spec.defaultPeer" if
                                    # not set here
    
    Configuration example for Machine
    apiVersion: cluster.k8s.io/v1alpha1
    kind: Machine
    metadata:
      name: test-cluster-master-1
      namespace: managed-ns
      annotations:
        metal3.io/BareMetalHost: managed-ns/test-cluster-master-1
      labels:
        cluster.sigs.k8s.io/cluster-name: test-cluster
        cluster.sigs.k8s.io/control-plane: controlplane
        hostlabel.bm.kaas.mirantis.com/controlplane: controlplane
        ipam/RackRef: rack-master-1  # reference to the "rack-master-1" Rack
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
    spec:
      providerSpec:
        value:
          kind: BareMetalMachineProviderSpec
          apiVersion: baremetal.k8s.io/v1alpha1
          hostSelector:
            matchLabels:
              kaas.mirantis.com/baremetalhost-id: test-cluster-master-1
          l2TemplateSelector:
            name: test-cluster-master-1
          nodeLabels:            # optional. it is not used for BGP announcement
          - key: rack-id         # of the cluster API LB IP but it can be used
            value: rack-master-1 # for MetalLB if "nodeSelectors" are required
      ...
    
    Configuration example for L2Template
    apiVersion: ipam.mirantis.com/v1alpha1
    kind: L2Template
    metadata:
      labels:
        cluster.sigs.k8s.io/cluster-name: test-cluster
        kaas.mirantis.com/provider: baremetal
        kaas.mirantis.com/region: region-one
      name: test-cluster-master-1
      namespace: managed-ns
    spec:
      ...
      l3Layout:
        - subnetName: lcm-rack-control-1  # this network is referenced
          scope:      namespace           # in the "rack-master-1" Rack
        - subnetName: ext-rack-control-1  # optional. this network is used
          scope:      namespace           # for k8s services traffic and
                                          # MetalLB BGP connections
      ...
      npTemplate: |
        ...
        ethernets:
          lo:
            addresses:
              - {{ cluster_api_lb_ip }}  # function for cluster API LB IP
            dhcp4: false
            dhcp6: false
        ...
    

    The Rack object fields are described in API Reference: Rack resource.

    The configuration example for the scenario where Kubernetes masters are in the same rack or with an L2 layer extension between masters is described in Single rack configuration example.

    The configuration example for the scenario where Kubernetes masters are distributed across multiple racks without L2 layer extension between them is described in Multiple rack configuration example.