CV1800B, Baremetal

Opus · July 15, 2024, 6:03am

Hello, I’ve done some work on going baremetal with the CV1800B (on a milkV duo).

My starting point was this thread: 使用 Opensbi 引导自己的操作系统
After that, the source code in the official repo was also useful.

I can now run code on both C906 cores. I use the method described in the post above, so that means the FSBL gets loaded first, and in turn configures the hardware and loads your own code. Clocks and the DDR controller get configured for you. Actually, you have access to SBI (OpenSBI) calls, but I haven’t used that yet.

On the main core, your code will run in the S-mode. On the second core, it will run in the M-mode.

Here are a few remarks:

In this setup, the CPU clock for the main core is 850 MHz, and 594 MHz for the second core. You can change that using clock management registers.
The CLINT registers are not accessible from the S-mode, because they are protected with PMP. This kind of bites, as the C906 defines a set of CLINT registers specific to the S-mode, but the whole CLINT area is protected. Probably out of simplicity. That means you can’t configure the CLINT timer from S-mode. But there’s a SBI call for that. It’s just not very efficient.
Interrupts and exceptions are delegated to the S-mode on the main core. That works fine.
The second core doesn’t seem to be able to access RAM outside of its “allocated” 768KB, which is placed at the very top of the RAM. I’ve tried sharing some RAM at a lower address, between the two cores, but it doesn’t seem to work. The odd thing is that accesses from the second core, outside of these upper 768KB, do not trigger any exception, but they just seem to have zero effect. A bit odd. And pretty inconvenient.
The CV1800B datasheet (preliminary) is good to have, but it lacks a lot of information. There’s nothing about the Mailbox and nothing about the Pinmux.
I’ve downloaded datasheets for the SG2000 and SG2002, which are very similar to the CV1800B, and these do add some doc about the Pinmux. So that helps. But, still nothing about the Mailbox, apparently. So all you have it to dig into the source code in the repo.
Regarding sharing RAM and the Mailbox, you may precisely object that the Mailbox is made for sharing data between cores. But as far as I got it, there’s only 8 bytes of data in the Mailbox, so that’s very limited.

So, a few questions now:

Is there any hope of getting updated datasheets with the missing information, at some point in the future?
Is there, in particular, any documentation about how memory is mapped for the second core, which looks very opaque to me? Can anyone give me more info about why only the upper 768KB of RAM are accessible by the second core, if there’s any means to access any other area in RAM? And/or is there any way of using more RAM for the second core?

hannahKobain · July 15, 2024, 10:24am

Duo SDK FAQ #2 refers to mem config. Particularly, for Duo 64M you may want to change it here.

Opus · July 15, 2024, 10:11pm

For people using the provided Linux+FreeRTOS, that shoulld be it.

Keep in mind I’m dealing with baremetal development though, and not using Linux. For now, the “only” thing I use from the SDK is the FSBL and the fiptool script.

The RAM dedicated to ION is thus not relevant to my use case, although I’ll have to check exactly where in the boot process memory is allocated to ION.

Digging further into the FSBL code though, I think I found more info about how exactly RAM is split between the main core and second core.

The source file of interest is: fsbl/plat/cv180x/bl2/bl2_opt.c

One can see how the .bin files (for the main core and for the second core) are loaded, checked and copied to DDR RAM. A number of ROM functions are used.

The key that made me understand it more clearly was the lines where they read/write to the ‘AXI_SRAM_RTOS_BASE’ register. From what I understand now, the SoC implements this register to define the address at which the second core accesses memory, and from what I get and the experiments I did, both areas (before and after this address) are segregated completely, as I get it, from a AXI bus level, so that there’s probably absolutely NO way for both cores to share any RAM. Bummer. Sure, good thing is that neither can alter memory used by the other, but that could have been achieved with the PMP (memory protection). I’m guessing this has been done to simplify the design of the SoC.

That also means that the “separation line” in RAM can be freely modified. I’ll have to test that. From bl2_opt.c, I think the only thing to do in my case (baremetal) is to pass the corresponding address to fiptool:

The default I used is:
–BLCP_2ND_RUNADDR=“0x83F40000”

but changing it to anything lower should work as well, giving more RAM to the second core, and less to the first. Of course, the linker script for each core should be modified accordingly. I’ll test that.

So, the only way seems again to use the Mailbox. Which is rather limited. I wish the Mailbox was documented. The source code using it in the SDK is not very self-documenting. But it’s all we got at this point, I’m afraid. If anyone can point us to more documentation…

It seems that a more recent patch for the Linux kernel (cv1800b-mailbox.c source file) implements the Mailbox in a way that looks a bit clearer to reverse-engineer than the previous code in the SDK, so that should help. Still hoping that it’ll get documented at some point.

adailtonjn68 · July 16, 2024, 5:54pm

Thanks for sharing. Really valuable information.

Below my hate speech against MilkV:
I hate the documentation. There is no clear information about anything.
There is no clear information about the loader, the second loader, how to make the initial configuration, or even the TPU which is the reason most people would buy this chip. If the user wants something, they have to reverse engineer those binary files provided, and the fiptool.
The milkv company should be ashamed.

Opus · July 16, 2024, 10:25pm

There’s a lot of information missing, but I wouldn’t be so harsh.

They provide a comprehensive SDK, and the chips are supposed to be used with the provided tools, that is Linux on the main core and FreeRTOS on the second.

For people who want to use the chips “baremetal”, that sure isn’t enough. But the datasheets have the benefit of existing. Some other vendors of “similar” chips don’t even provide real datasheets, sometimes these are more like product briefs. So, here, it’s still a much better situation. And even if it requires some digging and “reverse-engineering”, there is a lot of open source code to help.

And to be fair, there may be nothing (much) milkV can do about it. The chips are from Sophgo - which I’m assuming is another company, although I don’t know what links there are with milkV. If this is the same host company, then forget about this point.

We can still hope that Sophgo will improve their datasheets over time. I’d be happy to help by listing precisely what kind of missing information there is in more details.

There is one thing I’m wondering - the CV1800B, which is the cheapest SoC of the bunch (but still quite useful), is branded “CVITEK”, which I don’t know what links there are with Sophgo. Was it the same company that just changed names, or did Sohpgo buy the CV1800B SoC, and then released improved versions of it later (under the SG2xxx names)? TLDR; does the CV1800B have a future?

Back to more technical points: I managed to more or less figure out the Mailbox from the following source code in the SDK: cvi_mailbox.h, comm_main.c and cvi_spinlock.c / cvi_spinlock.h, all in the freertos directory of the SDK. Also, as I mentioned before, the cv1800b-mailbox.c source (for newer Linux kernels) helped clearing up some points.

One other thing that is not clearly documented in the CV1800B datasheet, but that is in the SG2000 one, and that I’ll have to assume is the same, is the interrupt numbering for each core. Apparently, it’s not the same numbering for the first, and second core, and the second core doesn’t support all interrupts either. This isn’t seen in the CV1800B datasheet. It’s also not completely clear how the PLIC is shared between the two cores. For the latter point, so far I’m assuming that this is the same PLIC, but we have to use the “Hart1” (H1) registers instead of the “Hart0” (H0) registers. That would look reasonable, but the source code in the SDK doesn’t seem to show that. I haven’t tested yet, but will soon.

adailtonjn68 · July 17, 2024, 12:02pm

I guess you are right. Most of the things I complained are not MilkV’s entire fault. It’s the chips vendor itself.
The SDK is really well organized. I only wish it was more documented for hobbyists like us.
For example, the generation of the fip.bin file is not well documented by MilkV or Sophgo. To know something that happens in there, one has to open the fiptool and investigate it. Sophgo has a small document commenting it but very poor in details.

Good point also.
I suggest Sophgo improves their documentation. It will only benefit with even more people running to their products.

The other points I am not able to comment since I didn’t investigate that deep.

Right now, I am trying to reverse-engineer the loader.

hannahKobain · July 17, 2024, 5:34pm

You forgot to mention our tiny community, lol. ;D

cleverca22 · July 17, 2024, 9:20pm

ive recently started working on the milk-v duo as well, and have done some of my own RE work

github.com

cleverca22/duo-nix/blob/master/opensbi.patch#L10-L18


      
          +        for (uint64_t addr = 0x04400000; addr < (0x04400000 + (64*1024)); addr += 16) {
          +          sbi_printf("%08lx  ", addr);
          +          for (uint64_t offset = 0; offset < 16; offset += 1) {
          +            uint8_t byte = *(uint8_t*)(addr + offset);
          +            sbi_printf("%02x ", byte);
          +          }
          +          sbi_printf("\n");
          +        }
           }

this patch modifies opensbi so it will dump the bootrom to the uart on startup
you can then use xxd to turn it back into a binary and load it into any decompiler that supports rv64

there is also a fipinfo.c in that repo, which can unpack a fip.bin, telling you both what it contains, and giving you every .bin within it
and overall, the repo is meant to build a fip.bin entirely from source, using the nix package manager

j1sys · July 17, 2024, 9:54pm

Opus et al -

I’ll be joining you soon in the dive to the bottom of the baremetal. My goals might be even lower level than yours. Like you I want TOTAL control from the boot to my controlled FreeRTOS environment. I’ve already achieved this on D1s/F133 and am deep in prototypes for a gigabit powerful pixel controller for high-end lighting products. Hoping CV1800 will fill a niche for my lowend gear. All development with custom driver library developed on Windows eclipse environment. Duos en-route, our agent in guangzhou will be ordering Qty 50 ICs for initial prototypes ASAP. Watching your successes with interest and will update as my work begins. I’ve been reverse engineering for 50+ years.

j1sys

Opus · July 17, 2024, 10:17pm

Nice. I’m also considering getting rid of the FSBL and writing my own boot code eventually.

One benefit, apart from having complete control (and possibly to run on the main core in M-mode, as otherwise you’re started in S-mode), that’s likely to decrease boot time significantly, although I don’t know yet for sure where the bulk of the boot time comes from (DDR initialization? Just reading from the SD card?). I’ve timed it (from power on to my own code starting on the main core) to be approx. 1.8 s.

Opus · July 18, 2024, 6:32am

So, been working on the Mailbox, and it works fine.

The Mailbox contains 8 channels and supports up to 4 CPUs. There are 3 CPUs on the CV1800B (2 C906 cores, and a 8051) which can use the Mailbox. Technically, on the SG200x, there are 4 CPUs (2 C906 cores, 1 A53 and 1 8051), but AFAIK, one can’t use both the main C906 core and the A53 at the same time. So, that’s still 3 CPUs running.

The Mailbox also contains 8 spinlocks. Which is handy.

Each channel can be used to communicate from any CPU to any other CPU, and can pass 8 bytes (only). But nothing prevents you from using all channels, so you can actually enable the 8 channels for communicating between two CPUs, and thus pass 64 bytes at once. That’s already better. Still probably meant to be used for small commands and not for passing large amounts of data, although you can by fractioning it.

For passing more data, my thought at this point is that, while (as I mentioned earlier) it seems impossible to directly share any RAM between the two cores, the DMA controller should work for passing data from one domain to the other. Of course, that means that the data is not shared, but sent and received, but it should be much more efficient for passing large buffers than using the Mailbox. I’ll try that soon.

One oddity I ran into: the PLIC. I configured the PLIC (from the first core) to handle two interrupts (Timer0 and Mailbox). While doing so, I noticed that the PLIC on the CV1800B only seems to allow interrupt priorities above 24 (so 25 to 31). Any priority 24 or lower makes the interrupt disabled. As if the Threshold register was set to 24 - but I can guarantee you that it isn’t. Both STH and MTH are set to zero. Puzzled. Does that mean that this chip only supports 8 levels of priorities instead of the 32 that are defined in the C906 spec? Looks like it. Unless I missed something. Browsing the SDK source code didn’t help so far. I could find some (few) references to the PLIC, but nothing much about how they handle priorities in practice, with real values. If anyone has any idea about this interrupt priority thing with the PLIC, I’m all ears. Otherwise, you’ve been warned. (I lost a bit of time over this one.)

Oh, and, btw, I did check atomic instructions on the second core, and they work just fine. Just adding that point because I read in other topics that some people weren’t sure it supported the A extension. It does.

kinsa · July 18, 2024, 7:36am

Both cores can simultaneously access any part of the physical memory.

See here, this is an implementation of the Remote Processor Messaging (rpmsg) Framework to transfer data between cores. This uses the mailbox and a defined shared memory area.

adailtonjn68 · July 18, 2024, 11:17am

Wow, impressive work there.
I am really interested in your fipinfo.c file. I’ll have a look.

adailtonjn68 · July 18, 2024, 11:18am

Do you have a repository?

Opus · July 18, 2024, 9:28pm

Interesting but figuring out your work here would require significant reverse-engineering, I had a look and there’s a lot of source code, it’s tied to Linux (which we don’t talk about here as strictly baremetal is of interest in this topic), and people would have to know about the rpmsg framework to figure it out. So, thanks, but as it is, it’s not helping a ton.

As I stated earlier, from tests in a baremetal environment (kickstarted by the FSBL), both cores do not appear to be able to access any part of the physical memory. But that requires elaborating a bit more.

I had tested sharing an area of RAM between the two cores, that was outside of the area dedicated to the second core (which, as I explained, is some address, that can be set via a SoC register, up to the end of DDR RAM). The second core didn’t appear to be able to access it successfully. At least, write accesses didn’t have any effect.

But as usual with reverse-engineering, tests are never quite exhaustive and it’s hard to know in which direction to go, when we go blind.

So, I tried another approach. Share RAM inside the area dedicated to the second core instead. Since this area can be extended as needed, it’s ok. And, this approach does work. Which would appear to show that the main core is able to access the whole RAM, while the second core seems to be able to only access its own RAM area. Might require further tests to confirm, but that’s what I got so far. Unless there’s some magic to enable the second core to access more than this.

The above doesn’t look too surprising in hindsight, as from what I got, the main core does have an MMU, but the second one doesn’t. That said, the full picture is still unclear, as it’s not (publicly) documented.

But, as described above, I’ve found a way to share RAM, so, that’s all good. Possibly that was your approach as well (kinsa), again it was a bit too much code to go through, but you at least made me re-think it!

As to the interrupt priority thing, I found it odd in hindsight that only values in the 25-31 would work (didn’t really make sense), and that was here also a matter of reverse engineering. I hadn’t tested all priority values exhaustively, which made me miss the full picture. Actually, it’s just that the upper bit (bit 4) of the priority is ignored. So, instead of the 32 levels that the C906 PLIC is supposed to support (from the C906 specs), the CV1800B PLIC only seems to support 16 levels, from 0 to 15. Tested, and it works. Any higher value is just wrapped around, and thus 16 would be equivalent to 0, etc.

So, 16 priority levels it is. EDIT: Dang, still hadn’t tested ALL values. Turns out that it’s not 16, but only 8 priority levels (well, 7, 0 making the interrupt disabled). I’m pretty sure this is it now. I didn’t see it documented anywhere, but maybe it is.

Opus · July 18, 2024, 11:22pm

Regarding sharing RAM between the 2 cores, I’ll still have some more testing/work to do. I think my issues (sometimes reading garbage data in shared RAM) do not depend on where the area is, but on cache coherency issues. Sometimes it works and sometimes it doesn’t.

I tried using fence instructions, but that did not help. There’s probably something I’m missing here.

j1sys · July 19, 2024, 1:19am

Can’t wait to join in the fun. Sounds like memory sharing inside the 2nd cpu is logically consistent. 1st cpu has mmu and is more ‘trusted’, 2nd cpu is just memory locked within a safe sandbox that 1st cpu setup during boot. BTW: what should call cores? HART0, HART1? Foreground, Background? Primus, secondus? Just as we start to define baremetal environment. With memory sharing solved we can then use messagebox to pass tokens with references to buffers in dram.

Opus · July 19, 2024, 2:59am

So I “solved” my issue, which indeed was a cache problem. My test was simple: write some data structure on the 2nd core in a shared area, then send a message to the 1st core with the address of said structure, using the Mailbox. Then the 1st core would read data from this structure and log it. I was consistently getting bogus data. Only got good data in one case, which was probably just sheer luck due to the memory access sequence.

My guess at this point was that data written from the 2nd core would stay in its D-cache and so the 1st core would not see it. Which would imply that both cores (on the CV1800B) have separate caches. Which makes sense, although, again, it’s not clearly documented, and the architecture diagram seems to imply that the caches are shared between the 2 cores. They apparently aren’t. Are they identical in size though? Who knows. The diagram would suggest so, but the datasheet is otherwise not clear at all about it. The diagrams for the SG200x show that both cores have different caches, and that the ones for the 2nd core are smaller. For the CV1800B, it’s anyone’s guess.

Long story short, I added a “th.dcache.call” instruction after writing data in the shared area (on the 2nd core). And bam, it did solve the issue.

Now my question is, is there a different/more efficient way of dealing with this cache issue than triggering a write-back to the D-cache? Surely it must be a common problem for sharing RAM on multi-core systems with separate caches. Let me know your thoughts.

j1sys · July 19, 2024, 9:55am

TANSTAAFL (there ain’t no such thing as a free lunch)

Yes, cache coherency is a sticky wicket. Driver level programming has to take all this into account to present a clean defined API for the application level. We have to understand and take into consideration multiple cores, multiple caches, multiple dma engines, memory controllers, and even cache bypass memory maps (none documented for CV1800B). Great work, keep it up!!

willmore · July 19, 2024, 8:54pm

Can you use the PMP on the second core to mark that shared area as not cacheable? Or writethrough cached?