OpenStack known issues

This section lists the OpenStack known issues with workarounds for the Mirantis OpenStack for Kubernetes release 22.2.


[26278] ‘l3-agent’ gets stuck during Neutron restart

Fixed in MOSK 22.4

During l3-agent restart, routers may not be initialized properly due to erroneous logic in Neutron code causing l3-agent to get stuck in the Not ready state. The readiness probe states that one of routers is not ready with the keepalived process not started.

Example output of the kubectl -n openstack describe pod <neutron-l3 agent pod name> command:

Warning  Unhealthy  109s (x476 over 120m)  kubelet, ti-rs-nhmrmiuyqzxl-2-2obcnor6vt24-server-tmtr5ajqjflf \
Readiness probe failed: /tmp/health-probe.py:259: \
ERROR:/tmp/health-probe.py:The router: 66a885b7-0c7c-463a-a574-bdb19733baf3 is not initialized.

Workaround:

  1. Remove the router from l3-agent:

    neutron l3-agent-router-remove <router-name> <l3-agent-name>
    
  2. Wait up to one minute.

  3. Add the router back to l3-agent:

    neutron l3-agent-router-add <router-name> <l3-agent-name>
    

[25594] Security groups shared through RBAC cannot be used to create instances

Fixed in MOSK 22.5 for Yoga

It is not possible to create an instance that uses a security group shared through role-based access control (RBAC) with only specifying the network ID when calling Nova. In such case, before creating a port in the given network, Nova verifies if the given security group exists in Neutron. However, Nova asks only for the security groups filtered by project_id. Therefore, it will not get the shared security group back from the Neutron API. For details, see the OpenStack known issue #1942615.

Workaround:

  1. Create a port in Neutron:

    openstack port create --network <NET> --security-group <SG_ID> shared-sg-port
    
  2. Pass the created port to Nova:

    openstack server create --image <IMAGE> --flavor <FLAVOR> --port shared-sg-port vm-with-shared-sg
    

Note

If security groups shared through RBAC are used, apply them to ports only, not to instances directly.


[23985] Federated authorization fails after updating Keycloak URL

Fixed in MOSK 22.3

After updating the Keycloak URL in the OpenStackDeployment resource through the spec.features.keystone.keycloak.url or spec.features.keystone.keycloak.oidc.OIDCProviderMetadataURL fields, authentication to Keystone through federated OpenID Connect through Keycloak stops working returning HTTP 403 Unauthorized on authentication attempt.

The failure occurs because such change is not automatically propagated to the corresponding Keycloak identity provider, which was automatically created in Keystone during the initial deployment.

The workaround is to manually update the identity provider’s remote_ids attribute:

  1. Compare the Keycloak URL set in the OpenStackDeployment resource with the one set in Keystone identity provider:

    kubectl -n openstack get osdpl -ojsonpath='{.items[].spec.features.keystone.keycloak}
    # vs
    openstack identity provider show keycloak -f value -c remote_ids
    
  2. If the URLs do not coincide, update the identity provider in OpenStack with the correct URL keeping the auth/realms/iam part as shown below. Otherwise, the problem is caused by something else, and you need to proceed with the debugging.

    openstack identity provider set keycloak --remote-id <new-correct-URL>/auth/realms/iam
    

[22930] Octavia load balancers provisioning gets stuck

Fixed in MOSK 22.4

Octavia load balancers provisioning_status may get stuck in the ERROR, PENDING_UPDATE, PENDING_CREATE, or PENDING_DELETE state. Occasionally, the listeners or pools associated with these load balancers may also get stuck in the same state.

Workaround:

  • For administrative users that have access to the keystone-client pod:

    1. Log in to a keystone-client pod.

    2. Delete the affected load balancer:

      openstack loadbalancer delete <load_balancer_id> --force
      
  • For non-administrative users, access the Octavia API directly and delete the affected load balancer using the "force": true argument in the delete request:

    1. Access the Octavia API.

    2. Obtain the token:

      TOKEN=$(openstack token issue -f value -c id)
      
    3. Obtain the endpoint:

      ENDPOINT=$(openstack version show --service load-balancer --interface public --status CURRENT -f value -c Endpoint)
      
    4. Delete the affected load balancers:

      curl -H "X-Auth-Token: $TOKEN" -d '{"force": true}' -X DELETE $ENDPOINT/loadbalancers/<load_balancer_id>
      

[19065] Octavia load balancers lose Amphora VMs after failover

Fixed in MOSK 22.3

If an Amphora VM does not respond or responds too long to heartbeat requests, the Octavia load balancer automatically initiates a failover process after 60 seconds of unsuccessful attempts. Long responses of an Amphora VM may be caused by various events, such as a high load on the OpenStack compute node that hosts the Amphora VM, network issues, system service updates, and so on. After a failover, the Amphora VMs may be missing in the completed Octavia load balancer.

Workaround:

  • If your deployment is already affected, manually restore the work of the load balancer by recreating the Amphora VM:

    1. Define the load balancer ID:

      openstack loadbalancer amphora list --column loadbalancer_id --format value --status ERROR
      
    2. Start the load balancer failover:

      openstack loadbalancer failover <Load balancer ID>
      
  • To avoid an automatic failover start that may cause the issue, set the heartbeat_timeout parameter in the OpenStackDeployment CR to a large value in seconds. The default is 60 seconds. For example:

    spec:
      services:
        load-balancer:
          octavia:
            values:
              conf:
                octavia:
                  health_manager:
                    heartbeat_timeout: 31536000
    

[6912] Octavia load balancers may not work properly with DVR

Limitation

When Neutron is deployed in the DVR mode, Octavia load balancers may not work correctly. The symptoms include both failure to properly balance traffic and failure to perform an amphora failover. For details, see DVR incompatibility with ARP announcements and VRRP.