-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
rhel-9.5
-
Major
-
sst_virtualization_networking
-
ssg_virtualization
-
QE ack
-
False
-
-
Red Hat Enterprise Linux
-
Automated
-
-
x86_64
-
Linux
What were you trying to do that didn't work?
A few ping loss after the mlx vf migration finishes.
Please provide the package NVR for which bug is seen:
host:
qemu-kvm-8.2.0-7.el9_4.x86_64
VM:
NetworkManager-1.45.91-1.el9.x86_64
5.14.0-425.el9.x86_64
How reproducible:
100%
Steps to reproduce
1. set up the mlx PF in switchdev mode
# devlink dev eswitch set pci/0000:e1:00.0 mode switchdev # devlink dev eswitch show pci/0000:e1:00.0
2.create a mlx VF
# echo 2 > /sys/bus/pci/devices/0000\:e1\:00.0/sriov_numvfs
3. unbind the VF
# echo 0000:e1:00.1 > /sys/bus/pci/drivers/mlx5_core/unbind
4. enable the mlx VF's migration function
# devlink port function set pci/0000:e1:00.0/1 migratable enable # devlink port show pci/0000:e1:00.0/1
5. bind mlx VF to mlx_vfio_pci driver
# virsh nodedev-detach pci_0000_e1_00_1
6. start a VM with the mlx VF (managed='no')
<interface type='hostdev'> <mac address='52:54:00:56:8c:f7'/> <driver name='vfio'/> <source> <address type='pci' domain='0x0000' bus='0xe1' slot='0x00' function='0x1'/> </source> </interface>
7. configure an IP address for the mlx VF in the VM
In the VM
# nmcli connection add type ethernet ifname $ifname con-name $con-name ipv4.method manual ipv4.addresses 192.168.200.100/24
8. keep ping the VM
In another VM with a mlx VF from the same PF:
# ping 192.168.150.100
9. migrate the VM
# /bin/virsh migrate --live --domain rhel94 --verbose --desturi qemu+ssh://10.73.212.98/system Migration: [100.00 %]
10. check the ping statistics
# ping -c 70 192.168.150.100 PING 192.168.150.100 (192.168.150.100) 56(84) bytes of data. 64 bytes from 192.168.150.100: icmp_seq=1 ttl=64 time=0.157 ms 64 bytes from 192.168.150.100: icmp_seq=2 ttl=64 time=0.170 ms 64 bytes from 192.168.150.100: icmp_seq=3 ttl=64 time=0.176 ms 64 bytes from 192.168.150.100: icmp_seq=4 ttl=64 time=0.174 ms 64 bytes from 192.168.150.100: icmp_seq=5 ttl=64 time=0.194 ms 64 bytes from 192.168.150.100: icmp_seq=6 ttl=64 time=0.112 ms 64 bytes from 192.168.150.100: icmp_seq=7 ttl=64 time=0.163 ms 64 bytes from 192.168.150.100: icmp_seq=8 ttl=64 time=0.175 ms 64 bytes from 192.168.150.100: icmp_seq=9 ttl=64 time=0.238 ms 64 bytes from 192.168.150.100: icmp_seq=10 ttl=64 time=0.188 ms 64 bytes from 192.168.150.100: icmp_seq=11 ttl=64 time=0.185 ms 64 bytes from 192.168.150.100: icmp_seq=12 ttl=64 time=0.224 ms 64 bytes from 192.168.150.100: icmp_seq=13 ttl=64 time=0.195 ms <-- the mlx vf migration finishes at this time 64 bytes from 192.168.150.100: icmp_seq=19 ttl=64 time=0.413 ms <-- it takes 6 seconds for the ping to recover in this test 64 bytes from 192.168.150.100: icmp_seq=20 ttl=64 time=0.396 ms 64 bytes from 192.168.150.100: icmp_seq=21 ttl=64 time=0.263 ms 64 bytes from 192.168.150.100: icmp_seq=22 ttl=64 time=0.208 ms 64 bytes from 192.168.150.100: icmp_seq=23 ttl=64 time=0.370 ms 64 bytes from 192.168.150.100: icmp_seq=24 ttl=64 time=0.485 ms 64 bytes from 192.168.150.100: icmp_seq=25 ttl=64 time=0.386 ms
Expected results
No ping loss
Actual results
A few ping loss after the mlx vf migration finishes
Additional info:
(1) If I restart the NetworkManager service in the VM immediately after the mlx vf migration finishes, the ping can recover immediately as well.
In the VM:
[root@localhost ~]# systemctl restart NetworkManager
(2) The time required for ping recovery is different from test to test
(The number of ping loss is different from test to test)