Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-29445

[Networkmanager] A few ping loss after the mlx vf migration finishes

    • Major
    • sst_virtualization_networking
    • ssg_virtualization
    • QE ack
    • False
    • Hide

      None

      Show
      None
    • Red Hat Enterprise Linux
    • x86_64
    • Linux

      What were you trying to do that didn't work?
      A few ping loss after the mlx vf migration finishes.

      Please provide the package NVR for which bug is seen:
      host:
      qemu-kvm-8.2.0-7.el9_4.x86_64
      VM:
      NetworkManager-1.45.91-1.el9.x86_64
      5.14.0-425.el9.x86_64

      How reproducible:
      100%

      Steps to reproduce
      1. set up the mlx PF in switchdev mode

      # devlink dev eswitch set pci/0000:e1:00.0 mode switchdev
      # devlink dev eswitch show pci/0000:e1:00.0
      

      2.create a mlx VF

      # echo 2 > /sys/bus/pci/devices/0000\:e1\:00.0/sriov_numvfs
      

      3. unbind the VF

      # echo 0000:e1:00.1 > /sys/bus/pci/drivers/mlx5_core/unbind
      

      4. enable the mlx VF's migration function

      # devlink port function set pci/0000:e1:00.0/1 migratable enable
      # devlink port show pci/0000:e1:00.0/1
      

      5. bind mlx VF to mlx_vfio_pci driver

      # virsh nodedev-detach pci_0000_e1_00_1
      

      6. start a VM with the mlx VF (managed='no')

          <interface type='hostdev'>
            <mac address='52:54:00:56:8c:f7'/>
            <driver name='vfio'/>
            <source>
              <address type='pci' domain='0x0000' bus='0xe1' slot='0x00' function='0x1'/>
            </source>
          </interface>
      

      7. configure an IP address for the mlx VF in the VM
      In the VM

      # nmcli connection add type ethernet ifname $ifname con-name $con-name ipv4.method manual ipv4.addresses 192.168.200.100/24
      

      8. keep ping the VM
      In another VM with a mlx VF from the same PF:

      # ping 192.168.150.100
      

      9. migrate the VM

      # /bin/virsh migrate --live --domain rhel94 --verbose --desturi qemu+ssh://10.73.212.98/system
      Migration: [100.00 %]
      

      10. check the ping statistics

      # ping -c 70 192.168.150.100
      PING 192.168.150.100 (192.168.150.100) 56(84) bytes of data.
      64 bytes from 192.168.150.100: icmp_seq=1 ttl=64 time=0.157 ms
      64 bytes from 192.168.150.100: icmp_seq=2 ttl=64 time=0.170 ms
      64 bytes from 192.168.150.100: icmp_seq=3 ttl=64 time=0.176 ms
      64 bytes from 192.168.150.100: icmp_seq=4 ttl=64 time=0.174 ms
      64 bytes from 192.168.150.100: icmp_seq=5 ttl=64 time=0.194 ms
      64 bytes from 192.168.150.100: icmp_seq=6 ttl=64 time=0.112 ms
      64 bytes from 192.168.150.100: icmp_seq=7 ttl=64 time=0.163 ms
      64 bytes from 192.168.150.100: icmp_seq=8 ttl=64 time=0.175 ms
      64 bytes from 192.168.150.100: icmp_seq=9 ttl=64 time=0.238 ms
      64 bytes from 192.168.150.100: icmp_seq=10 ttl=64 time=0.188 ms
      64 bytes from 192.168.150.100: icmp_seq=11 ttl=64 time=0.185 ms
      64 bytes from 192.168.150.100: icmp_seq=12 ttl=64 time=0.224 ms
      64 bytes from 192.168.150.100: icmp_seq=13 ttl=64 time=0.195 ms <-- the mlx vf migration finishes at this time
      64 bytes from 192.168.150.100: icmp_seq=19 ttl=64 time=0.413 ms <-- it takes 6 seconds for the ping to recover in this test
      64 bytes from 192.168.150.100: icmp_seq=20 ttl=64 time=0.396 ms
      64 bytes from 192.168.150.100: icmp_seq=21 ttl=64 time=0.263 ms
      64 bytes from 192.168.150.100: icmp_seq=22 ttl=64 time=0.208 ms
      64 bytes from 192.168.150.100: icmp_seq=23 ttl=64 time=0.370 ms
      64 bytes from 192.168.150.100: icmp_seq=24 ttl=64 time=0.485 ms
      64 bytes from 192.168.150.100: icmp_seq=25 ttl=64 time=0.386 ms
      

      Expected results
      No ping loss

      Actual results
      A few ping loss after the mlx vf migration finishes

      Additional info:
      (1) If I restart the NetworkManager service in the VM immediately after the mlx vf migration finishes, the ping can recover immediately as well.

      In the VM:

      [root@localhost ~]# systemctl restart NetworkManager 
      

      (2) The time required for ping recovery is different from test to test
      (The number of ping loss is different from test to test)

            lvivier@redhat.com Laurent Vivier
            yanghliu@redhat.com YangHang Liu
            virt-maint virt-maint
            YangHang Liu YangHang Liu
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: