After facing increasing difficulty building the Sophgo 6.6 fork with a modern toolchain, I decided to try out the latest upstream effort from @unicornx seeing as the final major piece, PCIe, is supposedly ready to go. I built the kernel using the defconfigand enabled all the SG2042-related drivers I could find:
Using the latest Sophgo ZSBL, upstream OpenSBI, and sg2042-milkv-pioneer.dtb from the Linux sources, the kernel boots, but consistently hangs initilaizing the third PCIe RC. It always hangs after “Link up” and does not respond to any of the watchdog settings I applied or SysRq triggers. If I disable that specific controller in the device tree, it completes booting, but of course, there’s no NVME or other devices that sit behind that controller. I’ve exhausted all of my debugging experience, so I just wanted to see if anyone here has gotten it to work.
Thank you. I didn’t see anything in Revy’s fork that wasn’t in your 6.18-rc branch, but I tried it just in case and it hangs at the exact same spot immediately after initializing the last pci controller:
[ 2.457755] sg2042-pcie 7062800000.pcie: host bridge /soc/pcie@7062800000 ranges:
[ 2.465438] sg2042-pcie 7062800000.pcie: IO 0x4cc0c00000..0x4cc0ffffff -> 0x0000000000
[ 2.474041] sg2042-pcie 7062800000.pcie: MEM 0x4cf8000000..0x4cfbffffff -> 0x00f8000000
[ 2.482615] sg2042-pcie 7062800000.pcie: MEM 0x4cfc000000..0x4cffffffff -> 0x00fc000000
[ 2.491178] sg2042-pcie 7062800000.pcie: MEM 0x4e00000000..0x4fffffffff -> 0x4e00000000
[ 2.499739] sg2042-pcie 7062800000.pcie: MEM 0x4d00000000..0x4dffffffff -> 0x4d00000000
[ 2.508314] sg2042-pcie 7062800000.pcie: Memory resource size exceeds max for 32 bits
[ 2.516289] sg2042-pcie 7062800000.pcie: no "phy-names" property found; PHY will not be initialized
[ 2.525509] sg2042-pcie 7062800000.pcie: Link up
<hang>
I did a little more debugging and found the hang to happen in the cdns_pcie_host_init_address_translation (drivers/pci/controller/cadence/pcie-cadence-host.c) function at:
Interestingly, this function has not changed between 6.6 (working) and 6.18 (not working) so I’m digging through other changes to try to understand what is influencing this new, faulty behavior.
Update on upstream PCIe (Linux 6.18) on Milk-V Pioneer
I spent quite a bit of time trying to get the upstream pcie-sg2042 / Cadence-based driver working on Pioneer under Linux 6.18. Short version: I made real progress, identified a concrete upstream bug/quirk, but ultimately did not get a fully working setup and am reverting to 6.12 + the downstream Sophgo driver for now.
Here’s where things stand.
What was broken initially
On 6.18 with the upstream driver:
rc0 and rc2 would mostly come up
rc3 consistently hung shortly after “Link up”, during address translation (AT) programming
The hang was a hard lockup on readl()/writel() into Cadence AT registers
Key discovery
After instrumenting pcie-cadence-host.c heavily, it turned out that on SG2042:
AT (Address Translation) registers are only accessible via the RP aperture
Accessing AT registers via the “raw” Cadence base (as upstream does) causes a bus hang
This exactly matches behavior seen in the downstream pcie-cadence-sophgo driver, which implicitly routes AT accesses differently
By adding a quirk so that AT register reads/writes are redirected through the RP base, the hard hangs were resolved. With that fix:
Devices behind rc3 enumerate unreliably or with invalid config space
Example: endpoints show up as 17cd:0100 with a broken header
BAR sizing and downstream enumeration do not behave correctly
rc0 GPU also fails to initialize properly under the upstream driver
At this point enumeration is partially working but clearly incorrect, and fixing it would require deeper investigation into:
cfg window routing
outbound region sizing/count (cdns,max-outbound-regions exists downstream but not upstream)
possible SG2042-specific assumptions still missing from mainline
Conclusion (for now)
The upstream driver is not usable on Milk-V Pioneer as-is, at least for my setup
There is a real, demonstrable SG2042 quirk around AT register access that upstream does not currently handle
Downstream 6.12 + pcie-cadence-sophgo continues to work perfectly, so I’m sticking with that for now
I’m parking this here rather than continuing to patch around it. If/when this comes back up, the next step is likely a proper upstreamable quirk for SG2042 Cadence AT access, followed by a careful comparison against the downstream driver’s outbound region handling.
Hopefully this saves someone else a few days of head-scratching.