Yep something is obviously extremely wrong. Just ran a speed test on the Teensy and it took 60ms compared to the MilkV which took 8000ms just doing some math.
Is this with the cache explicitly enabled or not? You pretty much need it when dealing with DRAM (also for XIP flash, I ran some benchmarks on the STM32H503 and it ran twice as slow without the cache).
Also, did you use the same optimization level?
You can use the following code to enable the cache (in the setup function, maybe also add a delay after it) if you want to give it a go to see the improvement (credit to someone on the forum, canât find it right now unfortunately). Keep in mind that the cache has its own complications, but for the benchmark it shouldnât matter.
Also make sure to optimize for size (i.e. âSmallest sizeâ) on the Teensy for a closer apples to apples comparison, since thatâs the default on the Duo.
void enable_cache()
{
asm volatile(
/*C906 will invalid all I-cache automatically when reset*/
/*you can invalid I-cache by yourself if necessarily*/
/*invalid I-cache*/
"li x3, 0x33 \n"
"csrc 0x7c2, x3 \n"
"li x3, 0x11 \n"
"csrs 0x7c2, x3 \n"
// it can also use icache instruciton to replace the invalid sequence if theadisaee is enabled.
//icache.iall
//sync.is
/*enable I-cache*/
"li x3, 0x1 \n"
"csrs 0x7c1, x3 \n"
/*C906 will invalid all D-cache automatically when reset*/
/*you can invalid D-cache by yourself if necessarily*/
/*invalid D-cache*/
"li x3, 0x33 \n"
"csrc 0x7c2, x3 \n"
"li x3, 0x12 \n"
"csrs 0x7c2, x3 \n"
// it can also use dcache instruciton to replace the invalid sequence if theadisaee is enabled.
// dcache.iall
// sync.is
/*enable D-cache*/
"li x3, 0x2 \n"
"csrs 0x7c1, x3 \n"
);
}