AMD Radeon RX550 GPU not working

There is a pull request with a fix for the PCIe driver in the 6.1.15 bianbu kernel:
PCIe PR

I built that kernel, copied the firmware files over to /lib/firmware/amdgpu and booted the system.
The driver starts to load but then throws this error:

[ 2.584592] [drm] Detected VRAM RAM=4096M, BAR=0M
[ 2.589346] [drm] RAM width 128bits GDDR5
[ 2.593507] [drm] amdgpu: 4096M of VRAM memory ready
[ 2.598535] [drm] amdgpu: 7938M of GTT memory ready.
[ 2.603637] [drm] GART: num cpu pages 65536, num gpu pages 65536
[ 2.609732] amdgpu 0002:01:00.0: amdgpu: (-22) kernel bo map failed
[ 2.616086] [drm:amdgpu_device_init] ERROR sw_init of IP block <gmc_v8_0> f
ailed -22
[ 2.624117] amdgpu 0002:01:00.0: amdgpu: amdgpu_device_ip_init failed
[ 2.630630] amdgpu 0002:01:00.0: amdgpu: Fatal error during GPU init
[ 2.637057] amdgpu 0002:01:00.0: amdgpu: amdgpu: finishing device.
[ 2.643385] amdgpu: probe of 0002:01:00.0 failed with error -22
[ 2.678480] [drm] init_lt8911exb()

I found some statements about that error meaning the firmware is not loaded but as I said, it exists.

Any hints or anybody found another way to get it to run?

Ok, from hints i got, it’s really that the PCIe interface is having 32bits only so every GPU beyond 4GB is not working.
I then used an old GeForce 7300 GT and it worked until the login screen.
But USB didn’t work whenever the GPU was initialized by its driver.
Not with the 6.1 kernel and not with 6.6.
Now I’m stuck again…any hints?

it should be possible - see: Opvolger/milkVjupiter/OpenSUSEATIRadeonR9_290.md at master ¡ Opvolger/Opvolger ¡ GitHub via: https://www.reddit.com/r/RISCV/comments/1g634s0/milkv_jupiter_with_external_gpu_playing_factorio/

good luck - hexdump

2 Likes

Thank you for that link, very insightful.

What I don’t understand is, why you need a converter in the first place. I mean, as far as I know, a card with fewer lanes should work without it just slower!?
I mean PCIe 1x is the short connector in front and everything else is just optional for more speed, right?

Main problem for me is still, that PCIe seems to interfere with USB 2 and 3 ports.

And, as I said: the Geforce did work without an adapter it was just interfering with USB so I couldn’t type anything for login.

dmesg and journalctl didn’t show anything either (was logged in via serial console).

Any more insights on how to debug such stuff?

Thanks!

1 Like

There are IO problems with this SOC I don’t really understand. I have an RX 590 working without using a converter using this kernel: Commits · StackDoubleFlow/spacemit-k1-linux-6.1 · GitHub

I never had that exact error, but I had a similar one which led to amdgpu: (-14) failed to allocate kernel bo
This was caused by swiotlb buffer is full earlier in the log, though it wasn’t really full, it was just trying to allocate a block size greater than the max, and that’s what led to me changing IO_TLB_SEGSIZE.

If you try again, let me know how that goes, I’m curious to see if the RX 550 will run more smoothly, since I still have crashes with the RX 590 after ~20-30 minutes of use.

2 Likes

Sorry for the late answer…I’ve been on vacation :wink:
That’s a nice catch…I’ll build a new kernel later and report back!
Cheers!

1 Like

IO_TLB_SEGSIZE

I can’t find such a config option in the kernel you mentioned:

~/repositories/spacemit-k1-linux-6.1$ grep TLB .config
CONFIG_CGROUP_HUGETLB=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
# CONFIG_BT_HCIBTUSB_RTLBTUSB is not set
CONFIG_ARCH_SUPPORTS_HUGETLBFS=y
CONFIG_HUGETLBFS=y
CONFIG_HUGETLB_PAGE=y
CONFIG_SWIOTLB=y

.config is a copy of arch/riscv/configs/k1_extern_gpu_defconfig

1 Like

It’s not a config option. It’s this code patch in ‎include/linux/swiotlb.h. If you’re using my kernel, then it’s already there.

2 Likes

Understood…I was wondering already why there is no CONFIG_ in front of that setting :wink:
Built the kernel yesterday and I’ll try it out today with my RX550

1 Like

Hmm…my kernel build from yesterday didn’t produce an image…must have overseen an error. Today I rebuilt it after ‘make distclean’ and it stopped building when compiling the radeon module and after configuring radeon out, it did the same thing with the nouveau driver.
Did you build it on Bianbuz 1.0.15 stock or on a different system?

1 Like

Did a clean build again:
make distclean
echo ARCH=riscv
make k1_extern_gpu_defconfig
make menuconfig (disabled radeon and nouveau)
make -j8

build error:

drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c: In function ‘dc_stream_remove_writeback’:
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:531:55: warning: array subscript -1 is below array bounds of ‘struct dc_writeback_info[1]’ [-Warray-bounds=]
  531 |                                 stream->writeback_info[j] = stream->writeback_info[i];
      |                                 ~~~~~~~~~~~~~~~~~~~~~~^~~
In file included from ./drivers/gpu/drm/amd/amdgpu/../display/dc/dc.h:1272,
                 from ./drivers/gpu/drm/amd/amdgpu/../display/dc/inc/core_types.h:29,
                 from ./drivers/gpu/drm/amd/amdgpu/../display/dc/basics/dc_common.h:29,
                 from drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:27:
./drivers/gpu/drm/amd/amdgpu/../display/dc/dc_stream.h:241:34: note: while referencing ‘writeback_info’
  241 |         struct dc_writeback_info writeback_info[MAX_DWB_PIPES];
      |                                  ^~~~~~~~~~~~~~
  CC      drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_vm_helper.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dc/dc_helper.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dmub_srv.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dc/dc_edid_parser.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/modules/freesync/freesync.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/modules/color/color_gamma.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/modules/color/color_table.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/modules/info_packet/info_packet.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/modules/power/power_helpers.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_srv.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_srv_stat.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_reg.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn20.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn21.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn30.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn301.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn302.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn303.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn31.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn315.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn316.o
  CC      drivers/gpu/drm/amd/amdgpu/../display/dmub/src/dmub_dcn32.o
  AR      drivers/gpu/drm/amd/amdgpu/built-in.a
  AR      drivers/gpu/drm/built-in.a
  AR      drivers/gpu/built-in.a
make[1]: *** [scripts/Makefile.build:500: drivers] Fehler 2
make: *** [Makefile:2007: .] Fehler 2
1 Like

Is echo ARCH=riscv a command you ran? That does not set or display any actual environment variable.

The way I compile it (cross-compile from x86_64 arch linux) is like this

make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- k1_extern_gpu_defconfig
make ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- -j 32

The cross compile toolchain is from the riscv64-linux-gnu-gcc package on arch.

2 Likes

No, I just copied the wrong line. I just checked whether the variable is set before building and copied that line instead of the correspondent “export” line.

So, the difference seems to be, that I build that kernel on the Jupiter itself and Bianbu created their own build chain.

I’ll try building it under Archlinux on one of the next days.

1 Like

To check whether the variable “ARCH” is set, you would need to type “echo $ARCH”. In your reply you forgot the dollar sign. And to set the variable ARCH to the value “riscv”, you would type “export ARCH=riscv”

2 Likes

Yes, I know…as I said, copied the wrong line but this is not the issue in this thread. It’s about building on bare metal which isn’t working.

1 Like