Boy did I have an interesting day of coding yesterday.
It all started off with me trying to compile the bootloader for the Gumstix. We use u-boot, customized for our hardware, and I hadn’t touched the code at all until a few days ago; Gordon had been compiling it on his laptop — the only machine on which it was known to be compilable — and I thought it’d probably be useful for me to (a) understand it, (b) get it working, and (c) fix some of the bugs it has (like it won’t mount your MMC card on the first attempt, only on the second, etc). So I create a big tarball of Gordon’s setup and copy to my machine. He of course was running under cygwin on windows on his laptop. Whee. And using a quite old version of u-boot. So I finally manage to get it compiling with my cross compiler, with his changes applied against the latest u-boot version. It compiles! Ok, so now time to load and execute the sucker.
During the course of the next day or two, I ran into the following array of amusing thing, which one can only encounter when writing firmware:
- One of the things that u-boot needs to be able to do is to write to flash. Writing to the flash part we use is fairly straightforward (but not 100% straightforward). The easiest way to write (the one I use in this case) is to send an instruction to the flash device saying “I’m about to write a half-word”, followed by the address and word to be written. The way you do this is actually to poke the instruction to the address, then poke the value to be written to the address one bus clock cycle later, ie in C, it’s effectively:
volatile u16 *addr; u16 datum; ... *addr = 0x0040; // This is the "I'm writing data" instruction to the flash device sync_to_bus_cycle(); // left as an exercise for the reader *addr = datum; ...
That’s pretty much it, module some looping and error checking. Seems pretty simple. But it just wasn’t working. So I pop into the hardware-level JTAG debugger, and lo! gcc spat out code which is writing each of those u16s a byte at a time! ie it’s using 4 STRB instructions instead of 2 STRH instructions. And of course it’s scattered across bus cycles willy-nilly, so what ends up getting written to flash is more or less random.
Solution? Well, it turns out that GCC was spitting out lowest-common-denominator ARM code, and STRH is a special instruction which only became available later in the life of the ARMs. So specifying -mcpu=xscale solved the problem, and made my code about 10% smaller to boot! - This one took the better part of a day to track down. On power-up, the PXA255 starts executing instructions from address 0. On the gumstix, 0-4MB is in flash, and 0 is the start of the bootloader code. RAM starts at 0xA0000000. The bootloader code in flash looks like:
branch to reset
[vector table and some other data]
reset:
do some basic initialization
relocate the bootloader to RAM
jump to RAM and continue execution there
Pretty straightforward. Except that the code just absolutely refused to work. When you run it under the JTAG debugger, you could watch it execute the first branch, then the very next instruction would cause a bus error exception to be thrown. Doesn’t matter what the next instruction was. It could be load some immediate value to a register, it’d throw a bus exception. Perplexed, I spent a day tinkering around. Since the first instruction executed OK, I tinkered with doing different stuff there. Eventually, I came upon having it jump further into flash and continue execution higher in memory. This worked! If I just added padding to the [some other data] chunk there, and forced a longer branch, then the whole thing just worked. Then I discovered that without the padding, running without the debugger, it just worked too. It was only when doing a short jump, under the debugger that it would fail. And the minimum jump size that worked was about 2k. So in the end, my theory is that the debugger is displaying data and instructions from the actual flash if you ask it to dissassemble the code, but that the CPU is actually executing something else — probably something stored in the 2k instruction cache, and probably something the debugger is writing in there, like its own interrupt handlers and such. And, absolutely no documentation on this anywhere! Surely someone out there has used OCD Commander to debug an ARM startup using a JTAG device before, and there’d be some comment in the code saying “if you’re using a JTAG debugger like OCD Commander, then make sure to pad the start of your code above 2k!”, but no.

Recent Comments