Slow speed on Duo (Arduino)?

Yep something is obviously extremely wrong. Just ran a speed test on the Teensy and it took 60ms compared to the MilkV which took 8000ms just doing some math.

/*                Results
   -----------------------------------------
  |         Board           |     Time      |
  |-----------------------------------------|
  | Teensy 4.0 (600mhz)     |       60      |
  | ESP32 Dev (240mhz)      |      183      |
  | Nano Connect (150mhz)   |      334      |
  | Raspi Pico (150mhz)     |      335      |
  | Raspi Pico W (150mhz)   |      335      |
  | Nano Connect (133mhz)   |      377      |
  | Raspi Pico (133mhz)     |      378      |
  | Raspi Pico W (133mhz)   |      378      |
  | Nano Every (16mhz)      |     7412      |
  | Milkv Duo 64 (700mhz)   |     8073      |
   -----------------------------------------
*/
2 Likes

Is this with the cache explicitly enabled or not? You pretty much need it when dealing with DRAM (also for XIP flash, I ran some benchmarks on the STM32H503 and it ran twice as slow without the cache).
Also, did you use the same optimization level?

1 Like

No, I kept it vanilla as I am already well out of my comfort zone.

1 Like

You can use the following code to enable the cache (in the setup function, maybe also add a delay after it) if you want to give it a go to see the improvement (credit to someone on the forum, can’t find it right now unfortunately). Keep in mind that the cache has its own complications, but for the benchmark it shouldn’t matter.

Also make sure to optimize for size (i.e. “Smallest size”) on the Teensy for a closer apples to apples comparison, since that’s the default on the Duo.

void enable_cache()
{
  asm volatile(
    /*C906 will invalid all I-cache automatically when reset*/
    /*you can invalid I-cache by yourself if necessarily*/
    /*invalid I-cache*/
    "li x3, 0x33 \n"
    "csrc 0x7c2, x3 \n"
    "li x3, 0x11 \n"
    "csrs 0x7c2, x3 \n"
    // it can also use icache instruciton to replace the invalid sequence if theadisaee is enabled.
    //icache.iall
    //sync.is
    /*enable I-cache*/
    "li x3, 0x1 \n"
    "csrs 0x7c1, x3 \n"
    /*C906 will invalid all D-cache automatically when reset*/
    /*you can invalid D-cache by yourself if necessarily*/
    /*invalid D-cache*/
    "li x3, 0x33 \n"
    "csrc 0x7c2, x3 \n"
    "li x3, 0x12 \n"
    "csrs 0x7c2, x3 \n"
    // it can also use dcache instruciton to replace the invalid sequence if theadisaee is enabled.
    // dcache.iall
    // sync.is
    /*enable D-cache*/
    "li x3, 0x2 \n"
    "csrs 0x7c1, x3 \n"
  );
}
1 Like

I am using the Arduino IDE which gives me loads of errors

C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:57: Error: bad instruction `li x3,0x33'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:58: Error: bad instruction `csrc 0x7c2,x3'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:59: Error: bad instruction `li x3,0x11'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:60: Error: bad instruction `csrs 0x7c2,x3'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:61: Error: bad instruction `li x3,0x1'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:62: Error: bad instruction `csrs 0x7c1,x3'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:63: Error: bad instruction `li x3,0x33'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:64: Error: bad instruction `csrc 0x7c2,x3'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:65: Error: bad instruction `li x3,0x12'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:66: Error: bad instruction `csrs 0x7c2,x3'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:67: Error: bad instruction `li x3,0x2'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:68: Error: bad instruction `csrs 0x7c1,x3'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:259: Error: bad instruction `li x3,0x33'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:260: Error: bad instruction `csrc 0x7c2,x3'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:261: Error: bad instruction `li x3,0x11'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:262: Error: bad instruction `csrs 0x7c2,x3'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:263: Error: bad instruction `li x3,0x1'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:264: Error: bad instruction `csrs 0x7c1,x3'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:265: Error: bad instruction `li x3,0x33'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:266: Error: bad instruction `csrc 0x7c2,x3'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:267: Error: bad instruction `li x3,0x12'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:268: Error: bad instruction `csrs 0x7c2,x3'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:269: Error: bad instruction `li x3,0x2'
C:\Users\atorq\AppData\Local\Temp\cctxlAli.s:270: Error: bad instruction `csrs 0x7c1,x3'```

Just to confirm, you added the code just for the Milk-V right? It uses RISC-V assembly instructions that the Teensy doesn’t understand.

1 Like