Detect Proxmox hardware unit hang

VM connectivity was somehow blocked

Share on:  
                 

Recently I had some issues with the reliability of my local network connections from the proxmox host and their services. This came up with application crashes on a Windows 11 VM where explorer.exe restarted every time I connected to this VM via RDP or Plex started buffering the stream in random situations. Turns out this was due the reset of the built-in network card on the proxmox host itself.

Detect the problem

In proxmox syslog I saw alot messages like these:

 

Jan 05 12:31:21 pve kernel: e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
  TDH                  <c9>
  TDT                  <10>
  next_to_use          <10>
  next_to_clean        <c8>
buffer_info[next_to_clean]:
  time_stamp           <11d57416d>
  next_to_watch        <c9>
  jiffies              <11d574588>
  next_to_watch.status <0>
MAC Status             <80083>
PHY Status             <796d>
PHY 1000BASE-T Status  <3800>
PHY Extended Status    <3000>
PCI Status             <10>
Jan 05 12:31:22 pve kernel: e1000e 0000:00:1f.6 eno1: Reset adapter unexpectedly
Jan 05 12:31:22 pve kernel: vmbr0: port 1(eno1) entered disabled state
Jan 05 12:31:26 pve kernel: e1000e 0000:00:1f.6 eno1: NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx
Jan 05 12:31:26 pve kernel: vmbr0: port 1(eno1) entered blocking state
Jan 05 12:31:26 pve kernel: vmbr0: port 1(eno1) entered forwarding state
Jan 05 12:31:28 pve pvestatd[1119]: status update time (6.257 seconds)

I have a Intel I219-LM onboard Chip and after searching around it looks like this was a well known problem.

root@pve:~# lspci -v | grep Ethernet
00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10)
        DeviceName: Onboard - Ethernet
        Subsystem: Lenovo Ethernet Connection (7) I219-LM

There are multiple variations of this problem with different NICs, but for me and many others it solved to problem when you disable offloading.

Resolve the problem

Go to /etc/network/interfaces and add the last 2 lines with the parameters, check that the interface names are correct (via “ip a” for example).

auto vmbr0
iface vmbr0 inet static
        address 10.10.5.0/24
        gateway 10.10.5.xxx
        bridge-ports eno1
        bridge-stp off
        bridge-fd 0
        pre-up /sbin/ethtool --offload vmbr0 gso off tso off sg off gro off
        pre-up /sbin/ethtool --offload eno1 gso off tso off sg off gro off

In my case I have the loopback device (lo), the interface itself (eno1) and the bridge (vmbr0) for proxmox. You need to reference to your bridge and interface in the config.

root@pve:~# ip address
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether f8:75:a4:20:10:79 brd ff:ff:ff:ff:ff:ff
    altname enp0s31f6
3: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether f8:75:a4:20:10:79 brd ff:ff:ff:ff:ff:ff
    inet 10.10.5.5/24 scope global vmbr0
       valid_lft forever preferred_lft forever
    inet6 fe80::fa75:a4ff:fe20:1079/64 scope link 
       valid_lft forever preferred_lft forever

After applying and a reboot, my network interface is now stable and works as expeceted.

comments powered by Disqus