Just a FYI for folks here, since recent LLVM/clang has added support for specifying profiles in the -march option, and it looks like gcc plans to follow suit at some point in the future.
The X60 core in these SoCs does not support hardware misaligned access for vector instructions, and there is currently no support for emulating misaligned access for vector instructions in the Linux kernel or OpenSBI. As a result, the profile extension Zicclsm (which indicates support for misaligned access in both scalar and vector instructions), mandated by the RVA22U64 profile, is not supported.
The result is that it’s possible that applications compiled with -march=rva22u64_v or otherwise built to run on RVA22 compatible platforms with the optional vector extension enabled may crash if the code includes vector instructions which do misaligned access.
It’s possible that future versions of OpenSBI or the Linux kernel might implement emulation for misaligned vector access, which would be sufficient to fix compatibility for user-space applications (it would be slow, though!), but I’m not aware of any work being done towards this yet.
Not sure, but here’s what I use… and it might change in the future.
I’m currently using gcc 14, and what I ended up doing was specifying all the extensions which are supported by the cpu, supported by the 6.6 kernel, supported by gcc, and are part of RVA22, except the vector extensions: -march=rv64gc_zicntr_ziccif_ziccrse_ziccamoa_za64rs_zba_zbb_zbs_zihpm_zihintpause_zic64b_zicbom_zicbop_zicboz_zkt_zfh
(Note that my understanding is that ziccif, ziccrse, ziccamoa, za64rs, and zic64b describe hardware behaviour, not instruction sets.)
The reason that I’ve disabled the vector extensions is that gcc 14 has an issue with autovectorization where it generates vector instructions which do misaligned access even when the the cpu doesn’t support it. This is apparently fixed in gcc 15. I haven’t tried clang/llvm yet.
If you want, you could also add the extensions zicond (Integer conditional operations) and zbc (Carry-less multiplication) which aren’t part of RVA22 but are supported by the X60.
I didn’t include zicbop (Cache-block prefetch) since the 6.6 kernel doesn’t support reporting its presence. It’s probably ok to use? but I left it out since I wasn’t sure.
Are you using these CFLAGS for building the kernel? My understanding is that we should remove any flag related to floating point for kernel building, e.g., zfh.
This would explain why I could never get my code using vector intrinsics running on the Jupiter… Do you know off the top of your head what the alignment for vector instructions has to be? Does it need to be VLEN aligned or just 64-bit aligned?
Are you using these CFLAGS for building the kernel?
I haven’t tried building the kernel with anything other than the default CFLAGS. I wouldn’t expect a whole lot of difference, but the maybe letting it use the bit-manipulation extensions (zba, zbb, zbs) would be good?
Do you know off the top of your head what the alignment for vector instructions has to be?
Unfortunately, no idea. I’d expect it to require VLEN aligned. If you try it out, let us know.
I’d expect it to require VLEN aligned. If you try it out, let us know.
"Absolutely not. That would be totally against the RVV spec. The maximum hardware alignment allowed by the V ISA extension is the vector element size, not the entire vector register (or group).
RVA22U64 says unaligned accesses (at the element level) must work, but that’s not a property of a core alone but of the core + the M-mode software (e.g. SBI) to emulate unaligned accesses if necessary."
I read the spec more closely (and fixed my code): RVV’s memory instructions (without support for misaligned instructions) must be SEW aligned. So just aligned to whatever size data you’re working with. Thinking about it, anything greater makes no sense, as that would effectively mean, that once you start messing with the vector length (which is an integral feature of RVV), you’d get misaligned rather trivially.
Yep, and a particularly interesting thing is that the minimum element size is 1 byte, so it’s expected that you can easily do an optimized memcpy using the vector extensions which works with data of arbitrary length and alignment. (which, as noted in the reddit thread, is a great way to turn misaligned data into properly aligned data)