I have a problem were two applications running on an OpenWrt router are using too much CPU time. I can reproduce the problem fairly well, but I don’t have any easy obvious clues as to the cause yet. Worse, it’s behaves more like a memory leak. The task stays the same, but CPU usage slowly creeps up over an hour or so. I’ve tried setting up the same applications on a linux desktop, and I can make it happen, but it’s so very very slow. So while that was running, I looked into CPU profiling on linux, and particularly, which of them would run on OpenWrt. So immediately things like valgrind/cachegrind and dtrace and friends are ruled out. I scratched google cpu profiler as being too invasive, and gprof as having too many criticisms. Also, I’d never used any of these tools before, and I didn’t want to spend all day testing them individually. Searching for CPU profiling linux kept turning up two applications, oprofile and perf. First step then was, “which of these, if any, is already packaged for OpenWrt?”
Background requirements / assumptions
- You’re building your own OpenWrt images here…
- I started testing this on Backfire, but then moved to Attitude Adjustment when I started working with perf.
- You are free to recompile the code under test, and aren’t trying to test a live system
Turns out the both are, but you have to jump some hoops. In “make menuconfig
“, you need to got to “Gobal build settings” and enable “Compile the kernel with profiling enabled“. That’s all you need for oprofile, but for perf, go also to “Advanced configuration options (for developers)” and enable “Show broken platforms / packages“. I’m working on the atheros platform (ar2317), and apparently (I’m told on IRC) it’s a packaging problem more than a broken tool. It works on atheros at least, YMMV.
Continuing, you need to turn on the libelf1
library under “Libraries“, because the perf package doesn’t select it properly for you. Finally, under “Development” enable oprofile
, perf
or both. (When I started this, I didn’t have BROKEN packages enabled, so just used oprofile, and later used perf. If perf works, I’d only use perf, and skip oprofile altogether. I won’t talk about oprofile any further, but this was all you needed to get it to work)
So, that will make an image that can run the tools, but all binaries are (rightfully) stripped on OpenWrt by default in the build process, so you won’t get particularly useful results. In theory, you can record performance data on the router, and then generate the reports on your linux desktop with all the symbols, but I never worked out how to do this.
So, we want to run non-stripped binaries for anything we want to actually investigate. One way of doing this is the following magic incantation in your OpenWrt buildroot (Provided by jow_laptop
on IRC)
make package/foo/{clean,compile} V=99 CONFIG_DEBUG=y
You can then copy the binary from [buildroot]/staging_dir/target-arch_uClibc-0.x.xx.x/root-platform/usr/bin/foo
onto your device and run it. Actually, if you only want to analyse a single program, that may be enough for you. You can copy that non-stripped binary to the ramfs on /tmp, and just start profiling…
$ start_your_program
# capture 120 seconds worth of profiling data
$ perf record -p $(pidof your_program) sleep 120
# wait…
$ perf report
If your program actually runs to completion, rather than being a daemon of some sort, you can just use “perf record your_program”. This may actually give you enough information. Lucky you!
Here’s an example of recording a stripped binary, and recording a non-stripped binary. (This is a real program, but it’s not the point here.)
# Overhead Command Shared Object Symbol
# ........ ....... ...................... ......
#
59.34% mlifter [kernel] [k] 0x800cd150
25.04% mlifter libuClibc-0.9.33.2.so [.] 0x37990
5.17% mlifter libgcc_s.so.1 [.] 0xaaa4
2.96% mlifter libpthread-0.9.33.2.so [.] 0xfe54
1.68% mlifter libjson.so.0.0.1 [.] 0x4e00
1.25% mlifter [nf_conntrack] [k] 0x22e4
0.87% mlifter libmosquitto.so.1 [.] 0x17d0
0.87% mlifter ld-uClibc-0.9.33.2.so [.] 0x1478
0.64% mlifter mlifter [.] 0x6bac
0.61% mlifter [ip_tables] [k] 0x5c8
0.26% mlifter libuglylogging.so [.] 0x96c
0.26% mlifter libmodbus.so.5.0.0 [.] 0x2e60
0.23% mlifter [xt_conntrack] [k] 0x2d0
0.20% mlifter [nf_conntrack_ipv4] [k] 0x320
0.15% mlifter libcrypto.so.1.0.0 [.] 0x84e58
0.12% mlifter [iptable_raw] [k] 0
0.12% mlifter [iptable_nat] [k] 0x170
0.09% mlifter libuci.so [.] 0x2cec
0.06% mlifter [nf_nat] [k] 0x5e4
0.03% mlifter [nf_defrag_ipv4] [k] 0x34
0.03% mlifter [iptable_mangle] [k] 0xe0
0.03% mlifter [iptable_filter] [k] 0x40
And here’s the same thing, non-stripped
# Overhead Command Shared Object Symbol
# ........ ....... ...................... .............................
#
59.29% mlifter [kernel] [k] 0x800780e0
26.30% mlifter libuClibc-0.9.33.2.so [.] 0x39ff0
5.66% mlifter libgcc_s.so.1 [.] 0xa154
2.83% mlifter libpthread-0.9.33.2.so [.] 0xaa20
1.70% mlifter libjson.so.0.0.1 [.] 0x4b54
0.85% mlifter [nf_conntrack] [k] 0x6020
0.60% mlifter [ip_tables] [k] 0x630
0.58% mlifter libmosquitto.so.1 [.] 0x39cc
0.47% mlifter ld-uClibc-0.9.33.2.so [.] 0x29f0
0.23% mlifter libuglylogging.so [.] 0x7d0
0.19% mlifter libmodbus.so.5.0.0 [.] 0x55cc
0.19% mlifter [nf_conntrack_ipv4] [k] 0x2fc
0.17% mlifter mlifter [.] 0x1880
0.17% mlifter libcrypto.so.1.0.0 [.] 0x35b7c
0.15% mlifter mlifter [.] msg_add_branch_readings_full
0.15% mlifter [xt_conntrack] [k] 0xb0
0.15% mlifter [iptable_nat] [k] 0x360
0.06% mlifter [iptable_filter] [k] 0x40
0.04% mlifter mlifter [.] run_tasks
0.04% mlifter mlifter [.] sfw_read_smart_bar
0.04% mlifter [nf_defrag_ipv4] [k] 0x54
0.02% mlifter mlifter [.] sfw_provides_circuit_readings
0.02% mlifter mlifter [.] publish_backoff
0.02% mlifter mlifter [.] sfw_verify_modbus_connection
0.02% mlifter mlifter [.] task_modbus_poll
0.02% mlifter mlifter [.] task_check_state_complete
0.02% mlifter [nf_nat] [k] 0x610
0.02% mlifter [iptable_mangle] [k] 0x50
0.02% mlifter [fuse] [k] 0x70c
Not much of a difference. We can see calls into our own code, but well, we don’t spend much time there :) Really, to get useful output, you need non-stripped libraries, and the kernel symbols. You’re never going to fit all of that on a ramdisk on your OpenWrt router. It’s time for some remote filesystems.
I used sshfs for this, which was very easy to setup, but it’s pretty cpu heavy on a small router. NFS probably would have been a better choice. However you do it, mount your OpenWrt buildroot somewhere on your router.
Then, even without any changes in how you call your application, you can add the kernel symbol decoding, just add the -k parameter. “perf report -k /tmp/remote/build_dir/linux-atheros/linux-3.3.8/vmlinux
” Now we get the kernel chunks broken out, like in this snippet.
# Overhead Command Shared Object Symbol
# ........ ....... ...................... .................................
#
26.30% mlifter libuClibc-0.9.33.2.so [.] 0x39ff0
5.85% mlifter [kernel] [k] finish_task_switch.constprop.58
5.68% mlifter [kernel] [k] __queue_work
5.66% mlifter libgcc_s.so.1 [.] 0xa154
4.07% mlifter [kernel] [k] n_tty_write
2.83% mlifter libpthread-0.9.33.2.so [.] 0xaa20
2.83% mlifter [kernel] [k] __wake_up
2.41% mlifter [kernel] [k] tty_write
1.85% mlifter [kernel] [k] handle_sys
1.70% mlifter libjson.so.0.0.1 [.] 0x4b54
1.58% mlifter [kernel] [k] fsnotify
1.57% mlifter [kernel] [k] put_ldisc
1.30% mlifter [kernel] [k] vfs_write
0.94% mlifter [kernel] [k] tty_insert_flip_string_fixed_flag
0.91% mlifter [kernel] [k] __do_softirq
0.91% mlifter [kernel] [k] memcpy
We’re still missing symbols from uClibc, and any other user libraries we may call. (like libjson there) Fortunately, when we rebuilt our package with CONFIG_DEBUG=y earlier, it rebuilt the libraries it depends on in debug too. But now, we need to make sure we call our binary using the correct debug libraries. (This is the bit I think should be doable offline, but this is the way I got it to work at least!)
This is setting up the library path to use the non-stripped libraries. We then start perf using this process ID. (/tmp/remote is where I mounted my OpenWrt buildroot)
$ LD_LIBRARY_PATH=/tmp/remote/staging_dir/target-mips_uClibc-0.9.33.2/root-atheros/lib:/tmp/remote/staging_dir/target-mips_uClibc-0.9.33.2/root-atheros/usr/lib /tmp/remote/staging_dir/target-mips_uClibc-0.9.33.2/root-atheros/usr/bin/mlifter
And now we can see all the details….
# Overhead Command Shared Object Symbol
# ........ ....... ...................... .................................
#
6.39% mlifter [kernel] [k] finish_task_switch.constprop.58
5.83% mlifter [kernel] [k] __queue_work
5.01% mlifter libuClibc-0.9.33.2.so [.] _ppfs_parsespec
3.98% mlifter [kernel] [k] n_tty_write
3.27% mlifter libuClibc-0.9.33.2.so [.] _vfprintf_internal
3.24% mlifter [kernel] [k] __wake_up
2.77% mlifter libuClibc-0.9.33.2.so [.] __libc_write
2.48% mlifter [kernel] [k] tty_write
1.60% mlifter [kernel] [k] vfs_write
1.60% mlifter [kernel] [k] put_ldisc
1.57% mlifter [kernel] [k] fsnotify
1.53% mlifter libuClibc-0.9.33.2.so [.] _fpmaxtostr
1.45% mlifter [kernel] [k] handle_sys
1.31% mlifter libgcc_s.so.1 [.] __muldf3
1.11% mlifter libuClibc-0.9.33.2.so [.] __stdio_WRITE
1.08% mlifter [kernel] [k] tty_insert_flip_string_fixed_flag
1.06% mlifter libgcc_s.so.1 [.] __unpack_d
1.04% mlifter libuClibc-0.9.33.2.so [.] __stdio_fwrite
--- snip ---
0.19% mlifter libjson.so.0.0.1 [.] json_object_put
--- snip ---
0.11% mlifter libjson.so.0.0.1 [.] json_object_double_to_json_string
We can now see into uClibc, and into user libraries like libjson. (Trust me, I just didn’t include _alllll_ the output.) This is not actually the same point in the program, so don’t read too much into the actual numbers here. If you want, you can comment that this program is spending far too much time printing something to the console. (which is true!)
That’s it. You can now get symbol level information on what’s happening and how much. Whether this is useful is a different story, but this was fairly complicated to setup and test out, so I thought this might be helpful to others at some point.