Category Archives: tools

Unaligned memory access fault on Cortex-M3

AKA A surprising thing that happened to me while porting Contiki to the STM32F1.
AKA Some steps to take when diagnosing an unexpected hard fault on ARM Cortex M3

I already have a STM32L1 port working (for the basic uses of Contiki) and the major difference with this port is that it should support pretty much any target that libopencm3 supports. So I made a new platform and tweaked the GPIO settings for the STM32F1, and flashed it to my STM32VL Discovery board, and…. it started, but then it crashed.

Program received signal SIGINT, Interrupt.
blocking_handler () at ../../cm3/vector.c:86
86	{
(gdb) bt
#0  blocking_handler () at ../../cm3/vector.c:86
#2  update_time () at contiki/core/sys/etimer.c:72

Now, I don’t see unhandled exceptions much these days. I consulted the Configurable Fault Status Register (CFSR) at 0xE000ED28 and compared that to the definitions in ARM’s “Cortex M3 Devices Generic User Guide” (link will google search to the current location of that doc)

(gdb) x /wx 0xE000ED28
0xe000ed28:	0x01000000

Ok, some bit in the top 16bits. That’s the Usage Fault Status Register(UFSR). Let’s look at it a little closer because I can’t count hex digits in my head as well as some people.

(gdb) x /hx 0xE000ED2a
0xe000ed2a:	0x0100

Ok. That bit means, Unaligned access UsageFault. Awesome. One of the big selling points of ARM Cortex-M is that it doesn’t care about alignment. It all “just works”. Well, except for this footnote: "Unaligned LDM, STM, LDRD, and STRD instructions always fault irrespective of the setting of UNALIGN_TRP" Ok, so let’s see what caused that. GDB “up” two times to get to the stack frame before the signal handler. x /i $pc is some magic to decode the memory at the address pointed to by $pc.

(gdb) up
(gdb) up
#2  update_time () at contiki/core/sys/etimer.c:72
72	      if(t->timer.start + t->timer.interval - now < tdist) {
(gdb) x /i $pc
=> 0x80005c6 :	ldmia.w	r3, {r1, r4}
(gdb) info reg
r0             0x7d2	2002
r1             0x393821d9	959979993
r2             0x39381a07	959977991
r3             0x29d0fb29	701561641
r4             0x20000dc4	536874436
r5             0x2000004c	536870988
r6             0x0	0
r7             0x14	20
r8             0x20001f74	536878964
r9             0x20000270	536871536
r10            0x800c004	134266884
r11            0xced318f5	-825026315
r12            0x0	0
sp             0x20001fb8	0x20001fb8
lr             0x80005b9	134219193
pc             0x80005c6	0x80005c6 
xpsr           0x21000000	553648128

Check it out. There’s an ldm instruction. And r3 is clearly not aligned. (It doesn’t even look like a valid pointer to SRAM, but we’ll ignore that for now) Ok, so we got an unaligned access, and we know where. But what the hell?! Let’s look at the C code again. That t->timer is all struct stuff. Perhaps there’s some packed uint8_ts or something, maybe some “optimizations” for 8bit micros. Following the chain, struct etimer contains a struct process, which contains a struct pt which contains a lc_t. And only the lc_t. Which is an unsigned short. I guess there’s some delicious C rules here about promotion and types and packing. There’s always a rule.

Changing the type of lc_t to an unsigned int, instead of a short and rebuilding stops it from crashing. Excellent. Not. It does make the code a little bigger though.

karlp@tera:~/src/kcontiki (master *+)$ cat karl-size-short 
   text	   data	    bss	    dec	    hex	filename
  51196	   2836	   3952	  57984	   e280	foo.stm32vldiscovery
karlp@tera:~/src/kcontiki (master *+)$ cat karl-size-uint 
   text	   data	    bss	    dec	    hex	filename
  51196	   2916	   3952	  58064	   e2d0	foo.stm32vldiscovery
karlp@tera:~/src/kcontiki (master *+)$

I’m not the first to hit this, but it certainly doesn’t seem to be very common. Apparently you should be able to use -mnounaligned-access with gcc to force it to do everything bytewise, but that’s a pretty crap option, and it doesn’t seem to work for me anyway. Some people feel this is a gcc bug, some people feel it’s “undefined behaviour”. I say it’s “unexpected behaviour” :) In this particular case, there’s no casting of pointers, and use (or lack thereof) of any sort of “packed” attributes on any of the structs, so I’d lean towards saying this is a compiler problem, but, as they say, it’s almost never a compiler problem :)

Here are some links to other discussion about this. (complete with “MORON! COMPILERS ARE NEVER WRONG” type of helpful commentary :)

I’m still not entirely sure of the best way of proceeding from here. I’m currently using GCC version arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors) 4.7.3 20121207 (release) [ARM/embedded-4_7-branch revision 194305], and I should probably try the 4.7-2013-q1-update release, but if this is deemed to be “user error” then it’s trying to work out other ways of modifying the code to stay small for everyone where possible, but still work for everyone.

Not entirely what I’d planned on doing this evening, but someone enlightening at least.

libopencm3 with Contiki on STM32L part 2 – it’s alive, and a stupid error

In a previous post I got to a compiling and linking build for the STM32L Discovery board, but it didn’t actually print anything. Turns out I’d made one of the classically common mistakes with STM32 development, and one of the weakpoints in libopencm3’s api for the RCC module.

rcc_peripheral_enable_clock(&RCC_APB1ENR, RCC_APB2ENR_USART1EN);

The API unfortunately needs both the register, and the bit, and if you can’t see the problem, don’t worry, you’re not alone. The problem is that I was turning on a feature in APB1 (ONE) with a bit definition designed for APB2 (TWO)

When I actually turned on the USART peripheral, everything started working as I expected. I can’t believe how often I’ve done something like this. Or how often I’ve simply not turned on what I needed. Later, I added code to support USART2 and USART3, but…. didn’t turn on those peripherals. Silly me.

But, that does mean that my Contiki port to the STM32L Discovery board is alive:

Much more to come! Radio drivers and more examples and onwards!

make vim always start git commits on the first line

Vim has a (oftentimes) nice feature of remembering where in a file you were when you open it again. Which is all well and good, but it’s based on filenames, and git commit messages are always the same. So it remembers the line number where I finished my last commit message. I always want git commit messages to open up on the first line. Here’s some vimrc magic to make it Do The Right Thing™

" don't remember the cursor position in git commits
au FileType gitcommit au! BufEnter COMMIT_EDITMSG call setpos('.', [0, 1, 1, 0])

arm-none-eabi-gdb with python support on linux (Fedora 17)

I’ve been using the GCC ARM Embedded toolchain for STM32 development on linux for a while now. It’s maintained by ARM, it’s available for linux, windows and osx, and it’s just a zip of binaries. Untar, add to your path, and you’re golden. With the new 4.7 release (2012q4) it includes some decent code size improvements, and is generally just a one stop shop for getting a toolchain.

However, the linux builds don’t have python support in GDB. This isn’t necessarily a bad thing, but people are starting to put together some nice tools that plugin to GDB that rely on python support. There’s some experimental SWV/SWO support, there’s some neat support for printf without printf and, what I’m trying to do, dump a lot of data buffers straight out of ram on the target device for analysis in python.

So, I tried building it myself. I first tried just downloading recent gdb sources and adding python support, after setting the target to arm-none-eabi. but….. that didn’t play well with openocd, so I went back to trying to build the G-A-E provided sources instead.

Here’s what I needed on Fedora 17 x64

  • libmpc-devel
  • expat-devel
  • ncurses-devel
  • python-devel
  • bison

This then replaces step 5 from the provided instructions…

$ cd [extracteddir]/src
[dir]/src$ tar -xf gdb.tar.gz
[dir]/src$ cd gdb
./configure --with-expat --with-python --target=arm-none-eabi
make -j5

That got me a working gdb with python support…

CPU Profiling applications on OpenWrt with perf or oprofile

I have a problem were two applications running on an OpenWrt router are using too much CPU time. I can reproduce the problem fairly well, but I don’t have any easy obvious clues as to the cause yet. Worse, it’s behaves more like a memory leak. The task stays the same, but CPU usage slowly creeps up over an hour or so. I’ve tried setting up the same applications on a linux desktop, and I can make it happen, but it’s so very very slow. So while that was running, I looked into CPU profiling on linux, and particularly, which of them would run on OpenWrt. So immediately things like valgrind/cachegrind and dtrace and friends are ruled out. I scratched google cpu profiler as being too invasive, and gprof as having too many criticisms. Also, I’d never used any of these tools before, and I didn’t want to spend all day testing them individually. Searching for CPU profiling linux kept turning up two applications, oprofile and perf. First step then was, “which of these, if any, is already packaged for OpenWrt?”

Background requirements / assumptions

  • You’re building your own OpenWrt images here…
  • I started testing this on Backfire, but then moved to Attitude Adjustment when I started working with perf.
  • You are free to recompile the code under test, and aren’t trying to test a live system

Turns out the both are, but you have to jump some hoops. In “make menuconfig“, you need to got to “Gobal build settings” and enable “Compile the kernel with profiling enabled“. That’s all you need for oprofile, but for perf, go also to “Advanced configuration options (for developers)” and enable “Show broken platforms / packages“. I’m working on the atheros platform (ar2317), and apparently (I’m told on IRC) it’s a packaging problem more than a broken tool. It works on atheros at least, YMMV.

Continuing, you need to turn on the libelf1 library under “Libraries“, because the perf package doesn’t select it properly for you. Finally, under “Development” enable oprofile, perf or both. (When I started this, I didn’t have BROKEN packages enabled, so just used oprofile, and later used perf. If perf works, I’d only use perf, and skip oprofile altogether. I won’t talk about oprofile any further, but this was all you needed to get it to work)

So, that will make an image that can run the tools, but all binaries are (rightfully) stripped on OpenWrt by default in the build process, so you won’t get particularly useful results. In theory, you can record performance data on the router, and then generate the reports on your linux desktop with all the symbols, but I never worked out how to do this.

So, we want to run non-stripped binaries for anything we want to actually investigate. One way of doing this is the following magic incantation in your OpenWrt buildroot (Provided by jow_laptop on IRC)

make package/foo/{clean,compile} V=99 CONFIG_DEBUG=y

You can then copy the binary from [buildroot]/staging_dir/target-arch_uClibc-0.x.xx.x/root-platform/usr/bin/foo onto your device and run it. Actually, if you only want to analyse a single program, that may be enough for you. You can copy that non-stripped binary to the ramfs on /tmp, and just start profiling…

$ start_your_program
# capture 120 seconds worth of profiling data
$ perf record -p $(pidof your_program) sleep 120
# wait…
$ perf report

If your program actually runs to completion, rather than being a daemon of some sort, you can just use “perf record your_program”. This may actually give you enough information. Lucky you!

Here’s an example of recording a stripped binary, and recording a non-stripped binary. (This is a real program, but it’s not the point here.)

# Overhead  Command           Shared Object  Symbol
# ........  .......  ......................  ......
    59.34%  mlifter  [kernel]                [k] 0x800cd150
    25.04%  mlifter   [.] 0x37990 
     5.17%  mlifter           [.] 0xaaa4  
     2.96%  mlifter  [.] 0xfe54  
     1.68%  mlifter        [.] 0x4e00  
     1.25%  mlifter  [nf_conntrack]          [k] 0x22e4  
     0.87%  mlifter       [.] 0x17d0  
     0.87%  mlifter   [.] 0x1478  
     0.64%  mlifter  mlifter                 [.] 0x6bac  
     0.61%  mlifter  [ip_tables]             [k] 0x5c8   
     0.26%  mlifter       [.] 0x96c   
     0.26%  mlifter      [.] 0x2e60  
     0.23%  mlifter  [xt_conntrack]          [k] 0x2d0   
     0.20%  mlifter  [nf_conntrack_ipv4]     [k] 0x320   
     0.15%  mlifter      [.] 0x84e58 
     0.12%  mlifter  [iptable_raw]           [k] 0       
     0.12%  mlifter  [iptable_nat]           [k] 0x170   
     0.09%  mlifter               [.] 0x2cec  
     0.06%  mlifter  [nf_nat]                [k] 0x5e4   
     0.03%  mlifter  [nf_defrag_ipv4]        [k] 0x34    
     0.03%  mlifter  [iptable_mangle]        [k] 0xe0    
     0.03%  mlifter  [iptable_filter]        [k] 0x40    

And here’s the same thing, non-stripped

# Overhead  Command           Shared Object                         Symbol
# ........  .......  ......................  .............................
    59.29%  mlifter  [kernel]                [k] 0x800780e0
    26.30%  mlifter   [.] 0x39ff0 
     5.66%  mlifter           [.] 0xa154  
     2.83%  mlifter  [.] 0xaa20  
     1.70%  mlifter        [.] 0x4b54  
     0.85%  mlifter  [nf_conntrack]          [k] 0x6020  
     0.60%  mlifter  [ip_tables]             [k] 0x630   
     0.58%  mlifter       [.] 0x39cc  
     0.47%  mlifter   [.] 0x29f0  
     0.23%  mlifter       [.] 0x7d0   
     0.19%  mlifter      [.] 0x55cc  
     0.19%  mlifter  [nf_conntrack_ipv4]     [k] 0x2fc   
     0.17%  mlifter  mlifter                 [.] 0x1880  
     0.17%  mlifter      [.] 0x35b7c 
     0.15%  mlifter  mlifter                 [.] msg_add_branch_readings_full
     0.15%  mlifter  [xt_conntrack]          [k] 0xb0    
     0.15%  mlifter  [iptable_nat]           [k] 0x360   
     0.06%  mlifter  [iptable_filter]        [k] 0x40    
     0.04%  mlifter  mlifter                 [.] run_tasks
     0.04%  mlifter  mlifter                 [.] sfw_read_smart_bar
     0.04%  mlifter  [nf_defrag_ipv4]        [k] 0x54    
     0.02%  mlifter  mlifter                 [.] sfw_provides_circuit_readings
     0.02%  mlifter  mlifter                 [.] publish_backoff
     0.02%  mlifter  mlifter                 [.] sfw_verify_modbus_connection
     0.02%  mlifter  mlifter                 [.] task_modbus_poll
     0.02%  mlifter  mlifter                 [.] task_check_state_complete
     0.02%  mlifter  [nf_nat]                [k] 0x610   
     0.02%  mlifter  [iptable_mangle]        [k] 0x50    
     0.02%  mlifter  [fuse]                  [k] 0x70c   

Not much of a difference. We can see calls into our own code, but well, we don’t spend much time there :) Really, to get useful output, you need non-stripped libraries, and the kernel symbols. You’re never going to fit all of that on a ramdisk on your OpenWrt router. It’s time for some remote filesystems.

I used sshfs for this, which was very easy to setup, but it’s pretty cpu heavy on a small router. NFS probably would have been a better choice. However you do it, mount your OpenWrt buildroot somewhere on your router.

Then, even without any changes in how you call your application, you can add the kernel symbol decoding, just add the -k parameter. “perf report -k /tmp/remote/build_dir/linux-atheros/linux-3.3.8/vmlinux” Now we get the kernel chunks broken out, like in this snippet.

# Overhead  Command           Shared Object                             Symbol
# ........  .......  ......................  .................................
    26.30%  mlifter   [.] 0x39ff0 
     5.85%  mlifter  [kernel]                [k] finish_task_switch.constprop.58
     5.68%  mlifter  [kernel]                [k] __queue_work
     5.66%  mlifter           [.] 0xa154  
     4.07%  mlifter  [kernel]                [k] n_tty_write
     2.83%  mlifter  [.] 0xaa20  
     2.83%  mlifter  [kernel]                [k] __wake_up
     2.41%  mlifter  [kernel]                [k] tty_write
     1.85%  mlifter  [kernel]                [k] handle_sys
     1.70%  mlifter        [.] 0x4b54  
     1.58%  mlifter  [kernel]                [k] fsnotify
     1.57%  mlifter  [kernel]                [k] put_ldisc
     1.30%  mlifter  [kernel]                [k] vfs_write
     0.94%  mlifter  [kernel]                [k] tty_insert_flip_string_fixed_flag
     0.91%  mlifter  [kernel]                [k] __do_softirq
     0.91%  mlifter  [kernel]                [k] memcpy

We’re still missing symbols from uClibc, and any other user libraries we may call. (like libjson there) Fortunately, when we rebuilt our package with CONFIG_DEBUG=y earlier, it rebuilt the libraries it depends on in debug too. But now, we need to make sure we call our binary using the correct debug libraries. (This is the bit I think should be doable offline, but this is the way I got it to work at least!)

This is setting up the library path to use the non-stripped libraries. We then start perf using this process ID. (/tmp/remote is where I mounted my OpenWrt buildroot)

$ LD_LIBRARY_PATH=/tmp/remote/staging_dir/target-mips_uClibc- /tmp/remote/staging_dir/target-mips_uClibc-

And now we can see all the details….

# Overhead  Command           Shared Object                             Symbol
# ........  .......  ......................  .................................
     6.39%  mlifter  [kernel]                [k] finish_task_switch.constprop.58
     5.83%  mlifter  [kernel]                [k] __queue_work
     5.01%  mlifter   [.] _ppfs_parsespec
     3.98%  mlifter  [kernel]                [k] n_tty_write
     3.27%  mlifter   [.] _vfprintf_internal
     3.24%  mlifter  [kernel]                [k] __wake_up
     2.77%  mlifter   [.] __libc_write
     2.48%  mlifter  [kernel]                [k] tty_write
     1.60%  mlifter  [kernel]                [k] vfs_write
     1.60%  mlifter  [kernel]                [k] put_ldisc
     1.57%  mlifter  [kernel]                [k] fsnotify
     1.53%  mlifter   [.] _fpmaxtostr
     1.45%  mlifter  [kernel]                [k] handle_sys
     1.31%  mlifter           [.] __muldf3
     1.11%  mlifter   [.] __stdio_WRITE
     1.08%  mlifter  [kernel]                [k] tty_insert_flip_string_fixed_flag
     1.06%  mlifter           [.] __unpack_d
     1.04%  mlifter   [.] __stdio_fwrite
--- snip ---
     0.19%  mlifter        [.] json_object_put
--- snip --- 
     0.11%  mlifter        [.] json_object_double_to_json_string

We can now see into uClibc, and into user libraries like libjson. (Trust me, I just didn’t include _alllll_ the output.) This is not actually the same point in the program, so don’t read too much into the actual numbers here. If you want, you can comment that this program is spending far too much time printing something to the console. (which is true!)

That’s it. You can now get symbol level information on what’s happening and how much. Whether this is useful is a different story, but this was fairly complicated to setup and test out, so I thought this might be helpful to others at some point.

Installing Eagle (5.12) on Fedora 17 (64bit)

Eagle is only provided as a 32bit package for linux, even as of version 6.3. I’m still using 5.x, for compatibility reasons, so I was trying to get it installed on my newish Fedora 17 64 bit install. Eagle’s #1 FAQ item is how to do this, but it’s for fedora 10, and some of the packages have changed. You also don’t need to install as many, as some of them will be pulled in as dependencies.

  • glibc.i686
  • libXrender.i686
  • libXrandr.i686
  • libXcursor.i686
  • freetype.i686
  • fontconfig.i686
  • libXi.i686
  • libpng-compat.i686
  • libjpeg-turbo.i686
  • libstdc++.i686
  • openssl-devel.i686

There, much better.
Updated with corrections 2013-Nov-18

Code Size changes with “int” on 8bit and 32bit platforms

I was looking for a few bytes extra flash today, and realized that some old AVR code I had, which used uint8_t extensively for loop counters and indexes (dealing with small arrays) might not be all that efficient on the STM32 Cortex-M3.

So, I went over the code and replaced all places where the size of the counter wasn’t really actually important, and made some comparisons. I was compiling the exact same c file in both cases, with only a type def changing between runs.

Compiler versions and flags

platform gcc version cflags
AVR avr-gcc (GCC) 4.3.5 -DNDEBUG -Wall -Os -g -ffunction-sections -fdata-sections -Wstrict-prototypes -mmcu=atmega168 -funsigned-char -funsigned-bitfields -fpack-struct -fshort-enums
STM32 arm-none-eabi-gcc (GNU Tools for ARM Embedded Processors) 4.6.2 20120316 (release) [ARM/embedded-4_6-branch revision 185452] -DNDEBUG -Wall -Os -g -ffunction-sections -fdata-sections -Wstrict-prototypes -fno-common -mcpu=cortex-m3 -mthumb

And… here’s the results

counter type avr-size arm-none-eabi-size
unsigned int 1318 844
uint8_t (original) 1160 856
uint_least8_t 1160 856
uint_fast8_t 1160 844
int 1330 820
int8_t 1212 872
int_least8_t 1212 872
int_fast8_t 1212 820

I would personally say that it looks like ARM still has some work to go on optimizations. If _least8 and _fast8 take up more space than int it’s not really as polished as the avr-gcc code yet. For me personally, as this code no longer has to run on both AVR and STM32, I’ll just use int.

So, after extending this a bit, my original conclusion about the fast_ types not being fully optimized with arm-gcc were wrong. It’s more that, on AVR, your “don’t care” counters should be unsigned for smaller size, while on STM32, they should be signed (Though I still think it’s dodgy that int_least8_t resulted in bigger code than int_fast8_t) Also, even if signed is better in the best case, the wrong signed is also the worst case. Awesome.

Eclipse 4.2 spell check and user dictionaries.

So, there’s this bug:

Which is fine and all, I can switch to the c/c++ spelling check section. But this fails to address the massive fucking pile of fail this is. WHY ON EARTH IS THERE A SEPARATE SPELL CHECK SETTING FOR C/C++?!

I mean, why?! This is not like formatting and tabs where you might want different per language settings. But spell check settings based on whether I’m in C, python, java or lua?! HELL NO!