NIC goes dark when Proxmox kernel loads after GPU install (works again if GPU removed)
Like the title says. I installed a GPU, everything posts and boots fine. The lights on the Ethernet port are lit up and will stay lit up indefinitely (I assume) if I leave it at the kernel select screen.
But as soon as I load a kernel, the lights go dark. It also is not shown as an active client on my gateway, so it’s not working at all.
I’ve tried lots of commands I’ve found to force it up. It looks to me like the NIC assigned to vmbr0 is correct. Etc. I just can’t get it to work.
If I remove the GPU, it immediately works again. NIC stays up after the kernel loads and I can access the web UI as normal.
rooteprox. *
root@prox:*# ip a
10: «LOOPBACK, UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 :: 1/128 scope host noprefixroute valid_lft forever preferred_lft forever
enpsso: ‹BROADCAST, MULTICAST> mtu 1500 qdisc noop state DOHN group default qlen 1000 link/ether a8:a1:59:be:f2:33 brd ff:ff:ff:ff:ff:ff
enp0s31f6: «NO-CARRIER, BROADCAST, MULTICAST, UP> mtu 1500 qdisc pfifo_fast master vmbro state DOWN group default qlen 1000 link/ether a8:a1:59:be:f2:32 brd ff:ff:ff:ff:ff:ff
vmbrO: ‹NO-CARRIER, BROADCAST, MULTICAST, UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000 link/ether a8:a1:59:be:f2:32 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.3/24 scope global vmbro valid_lft forever preferred_lft forever
root@prox: *# cat /etc/network/interfaces
auto lo
iface lo inet loopback
iface enp0s31f6 inet manual
auto vmbro
iface vmbro inet static
address 192.168.1.3/24
gateway 192.168.1.1
bridge-ports enp0s31f6
bridge-stp off bridge-fd o
iface enps0 inet manual
source /etc/network/interfaces.d/*
root@prox: ~# service network restart
Failed to restart network.service: Unit network.service not found.
Possible something on your motherboard has PCIe lanes that are dedicated to GPU when it’s slotted, otherwise they can be used for other devices?
For example here’s a post about m.2 slots that, when used, affect the PCI on a particular board. May be worth checking your boards manual to see if there’s something similar.
The answer not only seemed a HUGE disappointment, but a bit baffling. The pdf manual says if you occupy that 5th m.2 slot, which is the Gen 5 one, the Pci-E 1 slot is automatically downgraded to 8x. This I thought would be unacceptable if running a behemoth like the RTX 4090 I eventually plan to get, as it requires a lot of power and bandwidth.
It’s late. I’ll have to pull the card and re run tomorrow. But here’s with the GPU in:
It’s an i7-14700 and an ASRock z690 extreme. I’m actually hoping to put a second GPU in the last PCIe slot so I can let proxmox use the iGPU, pass the 3060 into a Unix moonlight gaming VM, and pass an RX590 into a hackintosh VM.
I had an issue with an ASrock Tiachi where if I enabled virtualization, the network would disappear entirely. May want to check for FW updates for your board. I had nothing but issues with the shitty BIOS and even had to upgrade my CPU sooner than I wanted to do the update.
Make sure your CPU is still supported by the update.
There's generally one or two slots connected directly to the CPU running in x16 or x8 if there's two and both are connected, 4 lanes linking the CPU to the chipset, and the rest of the slots connect to the chipset and share that same x4 link. If your cpu has 24 lanes (Ryzen do/did a few years ago, Intel might but didn't a few years ago), the remaining 4 lanes usually go to an NVMe slot
I had a stock Debian install actually rename the device for my NIC when I changed GPUs. You should double-check if your NIC has the same entry in /dev with and without the GPU. After I changed the name in some
config files the NIC worked fine with the GPU in, it could be easy as that.
Like others have said you may be running out of PCIe lanes. If that isn't the problem and this is a software bug you could try blocklisting the GPU kernel module.
dmesg | less should allow you to scroll the output. You should use forward slash in less to search for the devices (hit enter), see if the modules are being loaded or if there some errors.