Vector computing on Duo 256M

Lesept · October 16, 2025, 10:52am

I tried with vl = 32 and m8 extensions, this provides a little higher speedup.

emeb · October 21, 2025, 2:58pm

You can’t use arbitrary values for vl - that’s the max number of lanes available in hardware and in these SoCs it’s maxed out at 4 lanes. If you notice in the earlier code you’ll see this function:

/* how many lanes to use this pass */

size_t vl = vsetvl_e32m1(len - i);

that takes the min() of the argument and the max HW lanes (4) so you always use as many as possible.

Lesept · October 21, 2025, 6:49pm

Thanks for the advice, I’m new to this.

However, are you 100% sure about this? When I force vl to 32, it works also and I get the same results.

emeb · October 28, 2025, 9:10pm

You can certainly ask for 32 lanes, but the hardware only has 4 if you’re doing single-precision floating point so you won’t get more than that. The purpose of the vsetvl_e32m1() function is to let you know what the hardware can do so you can index through your data vector effectively. If you don’t care then I guess you can YOLO it and hope for the best.