Category Archives: python

Xiaomi Miijia LYWSD03MMC with pure bluetoothctl

Because software is software, it all gets thrown out every now and again for….. reasons. BlueZ, on linux, has moved away from hciconfig and hcitool and gattttool and so on, and it’s all “bluetoothctl” and “btmgmt” See for some more on that.

Anyway, I got some of these really cheap Xiaomi BTLE temperature+humidity meters. Turns out there’s a custom firmware you can flash to them and all sorts, but really, I was just interested in some basic usage, and also, as I’m getting my toes wet in bluetooth development, working out how to just plain use them as intended! (well, mostly, I’ve no intention of using any Xiaomi cloud apps)

This application claims to use the standard interface, and it’s “documentation” of this interface is and the little bit of code at

def connect():
	#print("Interface: " + str(args.interface))
	p = btle.Peripheral(adress,iface=args.interface)	
	p.writeCharacteristic(0x0038,val,True) #enable notifications of Temperature, Humidity and Battery voltage
	return p

Well, that wasn’t very helpful. Using tools like NrfConnect, I’m expecting things like a service UUID and a characteristic UUID and things. Also, I don’t have gatttool to even try it like this :)

So, how to do it in “modern” bluetoothctl? Here’s the cheatsheet. First, find our meters…

$ bluetoothctl
# menu scan
# clear
# transport le
# back
# scan on
SetDiscoveryFilter success
Discovery started
[CHG] Controller <your hci mac> Discovering: yes
[NEW] Device <a meter mac> LYWSD03MMC
[NEW] Device <other devices> <other ids>
# scan off

Now, we need to connect to it. According to it actually does broadcast the details about once every 10 minutes, which is neat, but you need a key for it, and aintnobodygottimeforthat.gif.

# connect <mac address>
lots and lots of spam like
[NEW] Descriptor (Handle 0x9f94)
	Characteristic User Description
[NEW] Descriptor (Handle 0xa434)
	Client Characteristic Configuration
[NEW] Characteristic (Handle 0xa904)
	Vendor specific
[NEW] Descriptor (Handle 0xb1d4)
	Characteristic User Description
....more lines like this.
[CHG] Device MAC_ADDRESS UUIDs: 00000100-0065-6c62-2e74-6f696d2e696d
[CHG] Device MAC_ADDRESS UUIDs: 00001800-0000-1000-8000-00805f9b34fb
[CHG] Device MAC_ADDRESS UUIDs: 00001801-0000-1000-8000-00805f9b34fb
[CHG] Device MAC_ADDRESS UUIDs: 0000180a-0000-1000-8000-00805f9b34fb
[CHG] Device MAC_ADDRESS UUIDs: 0000180f-0000-1000-8000-00805f9b34fb
[CHG] Device MAC_ADDRESS UUIDs: 0000fe95-0000-1000-8000-00805f9b34fb
[CHG] Device MAC_ADDRESS UUIDs: 00010203-0405-0607-0809-0a0b0c0d1912
[CHG] Device MAC_ADDRESS UUIDs: ebe0ccb0-7a0a-4b0c-8a1a-6ff2997da3a6
[CHG] Device MAC_ADDRESS ServicesResolved: yes

The characteristic you need is ebe0ccc1-7a0a-4b0c-8a1a-6ff2997da3a6, under the similar service UUID. So now, we just “subscribe” to notifications from it.

# menu gatt
[LYWSD03MMC]# select-attribute ebe0ccc1-7a0a-4b0c-8a1a-6ff2997da3a6
[LYWSD03MMC:/service0021/char0035]# notify on
[CHG] Attribute /org/bluez/hci0/dev_MAC_ADDRESS/service0021/char0035 Notifying: yes
Notify started
[CHG] Attribute /org/bluez/hci0/dev_MAC_ADDRESS/service0021/char0035 Value:
  36 09 2a f6 0b                                   6.*..           
[CHG] Attribute /org/bluez/hci0/dev_MAC_ADDRRESS/service0021/char0035 Value:
  35 09 2a f6 0b                                   5.*..           
[CHG] Attribute /org/bluez/hci0/dev_MAC_ADDRESS/service0021/char0035 Value:
  33 09 2a f6 0b                                   3.*..           
[LYWSD03MMC:/service0021/char0035]# notify off

And…. now we can go back to and decode these readings.

>>> x = "2a 09 2a 4e 0c"
>>> struct.unpack("<HbH",binascii.unhexlify((x).replace(" ","")))
(2346, 42, 3150)

With Python, that gives us a temperature of 23.46°C, humidity of 42%, and a battery level of 3.150V.


Originally, I was blind to the subtle differences in the characteristic UUIDs, and thought I had to hand iterate them. The below text at least shows how you can do that, but it’s unnecessary.

Ok, now the nasty bit. We want to get notifications from one of the characteristics on service UUID ebe0ccb0-7a0a-4b0c-8a1a-6ff2997da3a6. But you can’t just “select” this, because, at least as far as I can tell, there’s no way of saying _which_ descriptor you want.

So, first you need to use scroll that big dump of descriptors (or view it again with list-attributes in the gatt menu) to work out the “locally assigned service handle” In my case it’s 0021, but I’ve got zero faith that’s a stable number. You figure it from a line like this…

[NEW] Characteristic (Handle 0xd944)
	Vendor specific

Now, which characteristic is which? Well, thankfully, Xiaomi uses the lovely Characteristic User Description in their descriptors, so you just go through them one by one, “read”ing them, until you get to the right one…

[LYWSD03MMC:]# select-attribute /org/bluez/hci0/dev_MAC_ADDRESS/service0021/char0035/desc0037 
[LYWSD03MMC:/service0021/char0035/desc0037]# read
Attempting to read /org/bluez/hci0/dev_MAC_ADDRESS/service0021/char0035/desc0037
[CHG] Attribute /org/bluez/hci0/dev_MAC_ADDRESS/service0021/char0035/desc0037 Value:
  54 65 6d 70 65 72 61 74 75 72 65 20 61 6e 64 20  Temperature and 
  48 75 6d 69 64 69 74 79 00                       Humidity.       
  54 65 6d 70 65 72 61 74 75 72 65 20 61 6e 64 20  Temperature and 
  48 75 6d 69 64 69 74 79 00                       Humidity.       

The sucky bit, is you need to manually select _each_ of the “descNNNN” underneath _each_ of the “charNNNN” under this service. (On my devices, it’s the second characteristic with dual descriptors…)

now, select the characteristic itself. (you can’t get notifications on the descriptor…)

[LYWSD03MMC:/service0021/char0035/desc0037]# select-attribute /org/bluez/hci0/dev_MAC_ADDRESS/service0021/char0035
[LYWSD03MMC:/service0021/char0035]# notify on
[CHG] Attribute /org/bluez/hci0/dev_MAC_ADDRESS/service0021/char0035 Notifying: yes
Notify started
[CHG] Attribute /org/bluez/hci0/dev_MAC_ADDRESS/service0021/char0035 Value:
  2b 09 2a 4e 0c                                   +.*N.           
[CHG] Attribute /org/bluez/hci0/dev_MAC_ADDRESS/service0021/char0035 Value:
  2a 09 2a 4e 0c                                   *.*N.           
[CHG] Attribute /org/bluez/hci0/dev_MAC_ADDRESS/service0021/char0035 Value:
  2b 09 2a 4e 0c                                   +.*N.           
[CHG] Attribute /org/bluez/hci0/dev_MAC_ADDRESS/service0021/char0035 Value:
  2b 09 2a 4e 0c                                   +.*N.           
[CHG] Attribute /org/bluez/hci0/dev_MAC_ADDRESS/service0021/char0035 Value:
  29 09 2a 4e 0c                                   ).*N.           
[LYWSD03MMC:/service0021/char0035]# notify off

Netbeans 8.x “Could not get children iterator” or “NULL VALUE RETURNED FOR CHILDREN”

If you’re getting messages like this when debugging and trying to look at variables, chances are you might be debugging some glib2 code. If you are, welcome to

It’s easy enough to manually patch the file, and presto, working debug again. It’s been fixed in GDB, I was seeing this in Fedora 22, which is not _entirely_ up to date these days, but you tend to expect that plain old host debugging should be pretty stable. blog post because the netbeans error string doesn’t help you find that gdb bug. You need to run gdb on the command line to find the real error: Python Exception iter() returned non-iterator of type '_iterator':

Using SWO/SWV streaming data with STLink under linux – Part 2

In Part 1, we set up some (very) basic code that writes out data via Stimulus Channel 0 of the ITM to be streamed otu over SWO/SWV, but we used the existing ST provided windows tool, “STLink” to be view the stream. Now let’s do it in linux.

OpenOCD has some very draft support for collecting this data, but it’s very rough around the edges. [1]

I wrote a tool based on my own decoding of USB traffic to be a little more flexible. You connect to the STLink hardware, and can start/stop logging, change trace files, and change which stimulus ports are enabled. It is quite rough, but functional. It should not be underestimated how important being able to start/stop tracing is. In the ARM debug docs, turning on or reconfiguring trace is undefined as far as having the output bitstream be properly synced. (Section D4.4 of “Flush of trace data at the end of operation” in the Coresight Architecture spec, and most importantly, “C1.10.4 Asynchronous Clock Prescaler Register, TPIU_ACPR” in the ARMv7M architecture reference manual)

Don’t get me wrong, although my tool works substantially better than OpenOCD does, it’s still very rough around the edges. Just for starters, you don’t have debug or flash at the same time! Having it integrated well into OpenOCD (or pyOCD?) is definitely the desired end goal here.

Oh yeah, and if your cpu clock isn’t 24MHz, like the example code from Part 1, then you must edit DEFAULT_CPU_HZ in the top of!

So, how do you use it?

First, get the source from github: You need pyusb 1.x, then run it, and type connect

karlp@tera:~/src/swopy (master)$ python 
:( lame py required :(
(Cmd) connect
STLINK v2 JTAG v14 API v2 SWIM v0, VID 0x483 PID 0x3748
DEBUG:root:Get mode returned: 1
DEBUG:root:CUrrent saved mode is 1
DEBUG:root:Ignoring mode we don't know how to leave/or need to leave
(1682, 2053)
('Voltage: ', 2.9293697978596906)
DEBUG:root:enter debug state returned: array('B', [128, 0])
('status returned', array('B', [128, 0]))
('status is: ', 'RUNNING')

Yes, there’s lots of debug. This is not for small children. You have been warned, but there is some help!

(Cmd) help

Documented commands (type help ):
connect     raw_read_mem32   run       swo_read_raw
magic_sync  raw_write_mem32  swo_file  swo_start   

Undocumented commands:
EOF          exit  leave_state  raw_read_debug_reg   swo_stop
enter_debug  help  mode         raw_write_debug_reg  version 


The commands of interest are swo_file, swo_start and swo_stop. So, enter a file name, and start it up…

(Cmd) swo_file blog.bin
(Cmd) swo_start 0xff
INFO:root:Enabling trace for stimbits 0xff (0b11111111)
DEBUG:root:READ DEBUG: 0xe000edf0 ==> 16842752 (0x1010000) status=0x80, unknown=0x0
DEBUG:root:WRITE DEBUG 0xe000edfc ==> 16777216 (0x1000000) (res=array('B', [128, 0]))
DEBUG:root:READMEM32 0xe0042004/4 returned: ['0x0']
DEBUG:root:WRITEMEM32 0xe0042004/4 ==> ['0x27']
DEBUG:root:WRITEMEM32 0xe0040004/4 ==> ['0x1']
DEBUG:root:WRITEMEM32 0xe0040010/4 ==> ['0xb']
DEBUG:root:START TRACE (buffer= 4096, hz= 2000000)
DEBUG:root:WRITEMEM32 0xe00400f0/4 ==> ['0x2']
DEBUG:root:WRITEMEM32 0xe0040304/4 ==> ['0x0']
DEBUG:root:WRITEMEM32 0xe0000fb0/4 ==> ['0xc5acce55']
DEBUG:root:WRITEMEM32 0xe0000e80/4 ==> ['0x10005']
DEBUG:root:WRITEMEM32 0xe0000e00/4 ==> ['0xff']
DEBUG:root:WRITEMEM32 0xe0000e40/4 ==> ['0xff']
DEBUG:root:READMEM32 0xe0001000/4 returned: ['0x40000000']
DEBUG:root:WRITEMEM32 0xe0001000/4 ==> ['0x40000400']
DEBUG:root:READMEM32 0xe000edf0/4 returned: ['0x1010000']
DCB_DHCSR == 0x1010000
(Cmd) rDEBUG:root:reading 16 bytes of trace buffer
DEBUG:root:Wrote 16 trace bytes to file: blog.bin
unDEBUG:root:reading 16 bytes of trace buffer
DEBUG:root:Wrote 16 trace bytes to file: blog.bin
DEBUG:root:reading 16 bytes of trace buffer
DEBUG:root:Wrote 16 trace bytes to file: blog.bin
DEBUG:root:reading 16 bytes of trace buffer
DEBUG:root:Wrote 16 trace bytes to file: blog.bin
DEBUG:root:reading 16 bytes of trace buffer
DEBUG:root:Wrote 16 trace bytes to file: blog.bin
DEBUG:root:reading 16 bytes of trace buffer
DEBUG:root:Wrote 16 trace bytes to file: blog.bin
DEBUG:root:reading 16 bytes of trace buffer
DEBUG:root:Wrote 16 trace bytes to file: blog.bin
DEBUG:root:reading 16 bytes of trace buffer
DEBUG:root:Wrote 16 trace bytes to file: blog.bin
DEBUG:root:reading 16 bytes of trace buffer
DEBUG:root:Wrote 16 trace bytes to file: blog.bin
DEBUG:root:reading 18 bytes of trace buffer
DEBUG:root:Wrote 18 trace bytes to file: blog.bin

Ok, great. but… where’d it go? Well. It’s in the native binary ARM CoreSight trace format, like so…

karlp@tera:~/src/swopy (master *)$ tail -f blog.bin | hexdump -C 
00000000  01 54 01 49 01 43 01 4b  01 20 01 37 01 31 01 38  |.T.I.C.K. .7.1.8|
00000010  01 0d 01 0a 01 54 01 49  01 43 01 4b 01 20 01 37  |.....T.I.C.K. .7|
00000020  01 31 01 39 01 0d 01 0a  01 54 01 49 01 43 01 4b  |.1.9.....T.I.C.K|
00000030  01 20 01 37 01 32 01 30  01 0d 01 0a 01 54 01 49  |. .7.2.0.....T.I|
00000040  01 43 01 4b 01 20 01 37  01 32 01 31 01 0d 01 0a  |.C.K. .7.2.1....|
00000050  01 54 01 49 01 43 01 4b  01 20 01 37 01 32 01 32  |.T.I.C.K. .7.2.2|
00000060  01 0d 01 0a 01 54 01 49  01 43 01 4b 01 20 01 37  |.....T.I.C.K. .7|
00000070  01 32 01 33 01 0d 01 0a  01 54 01 49 01 43 01 4b  |.2.3.....T.I.C.K|
00000080  01 20 01 37 01 32 01 34  01 0d 01 0a 01 54 01 49  |. .7.2.4.....T.I|
00000090  01 43 01 4b 01 20 01 37  01 32 01 35 01 0d 01 0a  |.C.K. .7.2.5....|
000000a0  01 54 01 49 01 43 01 4b  01 20 01 37 01 32 01 36  |.T.I.C.K. .7.2.6|
000000b0  01 0d 01 0a 01 54 01 49  01 43 01 4b 01 20 01 37  |.....T.I.C.K. .7|
000000c0  01 32 01 37 01 0d 01 0a  01 54 01 49 01 43 01 4b  |.2.7.....T.I.C.K|
000000d0  01 20 01 37 01 32 01 38  01 0d 01 0a 01 50 01 75  |. .7.2.8.....P.u|
000000e0  01 73 01 68 01 65 01 64  01 20 01 64 01 6f 01 77  |.s.h.e.d. .d.o.w|
000000f0  01 6e 01 21 01 0d 01 0a  01 54 01 49 01 43 01 4b  |.n.!.....T.I.C.K|
00000100  01 20 01 37 01 32 01 39  01 0d 01 0a 01 54 01 49  |. .7.2.9.....T.I|
00000110  01 43 01 4b 01 20 01 37  01 33 01 30 01 0d 01 0a  |.C.K. .7.3.0....|
00000120  01 68 01 65 01 6c 01 64  01 3a 01 20 01 32 01 34  |.h.e.l.d.:. .2.4|
00000130  01 35 01 32 01 20 01 6d  01 73 01 0d 01 0a 01 54  |.5.2. .m.s.....T|
00000140  01 49 01 43 01 4b 01 20  01 37 01 33 01 31 01 0d  |.I.C.K. .7.3.1..|
00000150  01 0a 01 54 01 49 01 43  01 4b 01 20 01 37 01 33  |...T.I.C.K. .7.3|

Which is ugly, but you get the idea.

This is where comes in. The author of the original SWO support in OpenOCD has some code to do this too, it’s more forgiving of decoding, but more likely to make mistakes. Mine is somewhat strict on the decoding, but probably still has some bugs.

Usage is pretty simple

$ python blog.bin -f
Jumping to the near the end
Not in sync: invalid byte for sync frame: 1
Not in sync: invalid byte for sync frame: 73
Not in sync: invalid byte for sync frame: 1
Not in sync: invalid byte for sync frame: 67
Not in sync: invalid byte for sync frame: 1
Not in sync: invalid byte for sync frame: 75
Not in sync: invalid byte for sync frame: 1
Not in sync: invalid byte for sync frame: 32
Not in sync: invalid byte for sync frame: 1
Not in sync: invalid byte for sync frame: 50
Not in sync: invalid byte for sync frame: 1
Not in sync: invalid byte for sync frame: 49
Not in sync: invalid byte for sync frame: 1
Not in sync: invalid byte for sync frame: 53
Not in sync: invalid byte for sync frame: 1
Not in sync: invalid byte for sync frame: 51
Not in sync: invalid byte for sync frame: 1
Not in sync: invalid byte for sync frame: 13
Not in sync: invalid byte for sync frame: 1
Not in sync: invalid byte for sync frame: 10
TICK 2154
TICK 2155
TICK 2156
TICK 2157
TICK 2194
TICK 2195
TICK 2196
Pushed down!
held: 301 ms
Pushed down!
TICK 2197
held: 340 ms
TICK 2198
TICK 2199
Pushed down!
TICK 2200
TICK 2201

You (hopefully) get the idea. When the writes to the stimulus ports are 8bit, simply prints it to the screen. So here we have a linux implementation of the SWV viewer from the windows STLink tool. It’s got a lot of debug, and a few steps, but the pieces are all here for you to go further.

In part three, we’ll go a bit further with this, and demonstrate how SWO lets you interleave multiple streams of data, and demux it on the host side. That’s where it starts getting fun. (Hint, look at the other arguments of and make 16/32bit writes to the stimulus registers)

To stop SWO capture, type “swo_stop” and press enter, or just ctrl-d, to stop trace and exit the tool.

[1] Most importantly, you can not stop/start the collection, you can only set a single file at config time, which isn’t very helpful for running a long demon. Perhaps even worse, OpenOCD is hardcoded to only enable stimulus port 0, which is a bit restrictive when you can have 32 of them, and being able to turn them on and off is one of the nice things.

Esoteric python of the day: * and **

I was hacking up a grotty (but neat!) python plugin for gdb last night, and ran into a quirk I didn’t really understand. I had a list of numbers, and was using struct to make a suitable memory buffer for gdb.selected_inferior().write_memory. Except, it wasn’t quite working….

import struct
q = [1,2,3,4]
x = struct.pack("<4H", q)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: pack requires exactly 4 arguments

I wasn’t really expecting this, normally things “just work” and python “does the right thing” (Computer! Do as I want, not as I say!) StackOverflow gave me the gem of using the * operator.

import struct
q = [1,2,3,4]
struct.pack("<4H", *q)

OK, well and good, but WTF?! Where did that come from? Turns out it’s not an operator per se, but part of the call syntax. So * turns a list into a string of positional arguments, and ** turns a dict into a set of keyword arguments, like so.

def gofish(where, when):
     print(where, when)
gofish("lake", "tomorrow")
('lake', 'tomorrow')
gofish({"where" : "a river", "when": "next weekend"})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: gofish() takes exactly 2 arguments (1 given)
gofish(**{"where" : "a river", "when": "next weekend"})
('a river', 'next weekend')

So there, a rather esoteric corner of python I’d never seen before.

arm-none-eabi-gdb with python support on linux (Fedora 17)

I’ve been using the GCC ARM Embedded toolchain for STM32 development on linux for a while now. It’s maintained by ARM, it’s available for linux, windows and osx, and it’s just a zip of binaries. Untar, add to your path, and you’re golden. With the new 4.7 release (2012q4) it includes some decent code size improvements, and is generally just a one stop shop for getting a toolchain.

However, the linux builds don’t have python support in GDB. This isn’t necessarily a bad thing, but people are starting to put together some nice tools that plugin to GDB that rely on python support. There’s some experimental SWV/SWO support, there’s some neat support for printf without printf and, what I’m trying to do, dump a lot of data buffers straight out of ram on the target device for analysis in python.

So, I tried building it myself. I first tried just downloading recent gdb sources and adding python support, after setting the target to arm-none-eabi. but….. that didn’t play well with openocd, so I went back to trying to build the G-A-E provided sources instead.

Here’s what I needed on Fedora 17 x64

  • libmpc-devel
  • expat-devel
  • ncurses-devel
  • python-devel
  • bison

This then replaces step 5 from the provided instructions…

$ cd [extracteddir]/src
[dir]/src$ tar -xf gdb.tar.gz
[dir]/src$ cd gdb
./configure --with-expat --with-python --target=arm-none-eabi
make -j5

That got me a working gdb with python support…

Early impressions with PyDev 2.2 and Eclipse 3.7

I’ve been happily using netbeans for C/C++ and python work, which works well enough to not really complain much. Mostly, I want IntelliJ for C code. I find eclipse big and clunky and awkward on the keyboard, and just generally a pain. No Eclipse, I do NOT want to have some sort of “workspace” I want you to just leave things where they are on the disk. Anyway, in Oracle’s infinite wisdom, they are continuing to destroy things that Sun built, and python support has been dropped in netbeans 7. A pity, as netbeans 7 added some nice debugging support for C/C++, and netbeans is much more tightly integrated than eclipse. Still a pale shadow of IntelliJ, but I digress.

So, I had to go and look for some alternative python editors. I’m currently trying out PyDev 2.2 with Eclipse 3.7. It mostly works ok too. It’s capable of running some unit tests, and has the basic highlighting and so on. However, it’s completion is not as good as I would like, nor think it should be. Take this for example.

def something(self):
    self.mylist = []
    # on this line, self.mylist. will give me the full builtin completion for lists
    self.otherlist = "blah blah".split()
    # split returns a list, but self.otherlist. has no completion here

It seems this can be worked around by “pre declaring” the type.

def something(self):
  self.somelist = []
  self.somelist = "blah blah".split()
  # self.somelist. produces full completion for list here....

This is…. odd?

Possibly related is that in python unit tests, I at least, normally use the self.assertEquals(left, right, msg) form, probably because I came from Java. However, self.assertEquals in PyDev doesn’t give me any completion guidance on the parameters at all. It turns out that in the implementation of python’s unittest, assertEquals is simply an alias for another function (assertEqual = assertEquals = failUnlessEqual) For whatever reason, this means that I get full completion and parameter help if I use the _real_ implementation, failUnlessEqual but no advice/help whatsoever if I use the assertEquals form.

Google says this is unhelpful.

  • “self.assertEquals python” returns 74300 results
  • “self.failUnlessEqual python” returns 35800 results

Update: this assertEquals vs failUnlessEqual is apparently only a problem for python < 2.7. Unfortunately debian stable (squeeze) at present still uses python 2.6 :( In more mundane items, I would _really_ like to know how to get IntelliJ's "ctrl-W" shortcut, for expanding a selection. (From the cursor in the middle of "karl" in the following line, 'self.wop = "this is karl in python".split()', pressing ctrl-w once would highlight 'karl', once more would select 'this is karl in python' (without the quotes), once more with the quotes, and then on to the entire rvalue, then the entire line. This stackoverflow post mentions a solution, but it doesn’t seem to work in PyDev windows, even after getting into the keymap and adding a “Select Enclosing Element” for the PyDev views (or the editor scope? the difference being?) it still doesn’t work.

Oh well, life goes on.

Public key encryption of MQTT messages, with C and python

WARNING Don’t do this. It does work, but you are limited to sending messages that are smaller than the public key size less padding bytes. The “correct” way is to use the higher level EVP_SealXXXX and EVP_OpenXXXX methods. (which are substantially uglier, but are “the right thing”™ I’ll try and package up some sample code in the neart future

As a follow on from yesterday, where we simply signed the message body being sent to the MQTT server, today we will use public key encryption to secure the message bodies themselves. For an environment where multiple people are connected to the MQTT server, but only one of them is authorized to be reading all the data, this seems a nice way of doing it. We don’t care if any given client is compromised, they only get our public key.

First off, I was less than impressed with the state of the public key encryption/decryption code in python. There’s nothing in the standard library, and PyCrypto doesn’t support reading in standard openssl key files, so I didn’t even bother looking much further. KeyCzar promises to be useful, doing “the right thing” instead of forcing you to know everything about crypto yourself. (I resent having to read the docs closely enough to make sure I chose PKCS_OEAP padding, rather than just PKCS padding for the RSA encryption. I do NOT want to be a crypto guy, and I know that I will never know enough) However, KeyCzar also insists on doing it’s own key management, so I moved on. M2Crypto seems to be the most fully implemented, but, unfortunately, it keeps the terrible/wonderful OpenSSL API. I’m using OpenSSL in the C code, so at least it’s consistent, even if it is a terribly verbose and awkward API.

Other things that I learnt were important, in the C code.

  • Remember to call XXX_free() on OpenSSL objects that are created. (such as RSA keys from PEM_read_bio_RSA_PUBKEY)
  • Remember that openssl calls to encrypt can only encrypt clear text less than a certain size. (256 bytes in my case)
  • Remember that the limit on clear text size does NOT account for padding sizes! RSA_size(key) returns the total size, but with the recommended RSA_PKCS1_OAEP_PADDING the overhead is 41 bytes!

The heart of the C encryption side:

int encrypt_message(unsigned char **encrypted, unsigned char *clear, uint32_t clear_len) {
    RSA *pubkey = rsa_get_public_key();
    int rsa_size = RSA_size(pubkey);
    assert(clear_len < rsa_size - 41);  // 41 is the padding size for RSA_PKCS1_OAEP_PADDING
    ILOG("rsa size = %d\n", rsa_size);
    *encrypted = malloc(RSA_size(pubkey));
    uint32_t encrypted_len = RSA_public_encrypt(clear_len, clear, *encrypted, pubkey, RSA_PKCS1_OAEP_PADDING);
    if (encrypted_len == -1) {
        fatal("Failed to encrypt data. %s", ERR_error_string(ERR_get_error(), NULL));
    return encrypted_len;

rsa_get_public_key() is some openssl gloop code that you should replace, it provides a hard coded public key I generated yesterday for some tests.

The python decryption code is even simpler:

    # This keyfile has had the passphrase removed....
    private_key = M2Crypto.RSA.load_key("/home/karlp/karl.test.private.nopf.key")
    clear_text = private_key.private_decrypt(payload_bytes, M2Crypto.RSA.pkcs1_oaep_padding)"Decrypted message: <%s>", clear_text)

You can get the code:

Update: The first edition of this post referred to problems with easily getting the binary payload from the MQTT message. This has since been solved, and will be integrated into a future release of mosquitto.

HMAC signing MQTT messages, with C and python

MQTT as a protocol is pretty light on security. There’s a username/password fields, but they are clear text. Mosquitto, the implementation that we’re using right now supports using these fields to limit read/write access to given topics, but it feels a bit hard to manage. We would have to have a file with each topic in use, and multiple lines for each and every user in our system. (We have multiple clients over the internet, and we don’t want to let them see each others messages)

Instead, we’re going to have clients use our public key to encrypt their messages, and then use a shared key per client to sign the messages. This means that we can be sure that the message was not tampered with, that only someone with knowledge of a certain client’s key can pretend to be that client, and that no-one without our private key can actually read the messages. Our listener will first verify the signatures on each message. If a messages was received on topic “clientAAA” but fails to verify with the shared secret key of clientAAA, then we drop the message before even attempting to decrypt. If they match, we can decrypt, and process the contents.

I’ve put together a basic demo of the HMAC signing part, with a message publisher in C, and a message verifier/parser in python. Hopefully it will be useful to someone.
The code is available:

Binary payloads with Mosquitto v0.12 in python

Update 2011-09-28 This post only applies to Mosquitto v0.12. From v0.13, the ctypes extraction is done by mosquitto itself, and the msg.payload is a proper python string. msg.payload_len and msg.payload_str disappear.

Prior to 0.12, mosquitto‘s python bindings converted the message payload to a string. This was very convenient, you defined a simple on_message handler like so:

def on_message(msg):
    print "Received message on topic:%s, body: %s" % (msg.topic, msg.payload)

This was all well and good, but if you were receiving binary data on a topic, you were out of luck. This has now been rectified, but it is a breaking change for people expecting string data. To keep interpreting the payload as a string, just use msg.payload_str instead of msg.payload

But, how about that binary data? If you don’t get this right first time, you might get quite confused about all the ctypes magic types that keep showing up, but it’s not really that difficult when you get it worked out.

def on_message(msg):
    print "Received message on topic %s, length: %d bytes" % (msg.topic, msg.payloadlen)
    print "As binary data (hex):   ",
    for i in range(msg.payloadlen):
        print "%s " % (hex(msg.payload[i])),
    print ""
    print "As binary data (chars): ",
    for i in range(msg.payloadlen):
        print "   %c " % (chr(msg.payload[i])),
    print ""
    print "As string data (msg.payload_str): %s" % (msg.payload_str)

And, the output after sending a message witha \0 character in the middle:

  Received message on topic test, length: 10 bytes
  As binary data (hex):    0x68  0x65  0x6c  0x6c  0x6f  0x0  0x6b  0x61  0x72  0x6c  
  As binary data (chars):     h     e     l     l     o          k     a     r     l  
  As string data (msg.payload_str): hello

There, not so hard now, and your code didn’t need to know anything about c types :)

pyspatialite – spatial queries in python, built on sqlite3

I’ve been doing some work with spatial queries recently, and I really didn’t feel like setting up a full blown PostGIS setup, which is the “traditional” way, nor did I really feel like going and trying out these trendy NoSQL stores, like CouchDb and MongoDb, both of which apparently now support spatial work. Then I heard about pyspatialite.

I like python, and sqlite is lovely all those times you want to stuff a bunch of data somewhere, and not have to think too much about it. But, pyspatialite has virtually zero documentation, other than the hint that’s basically just a sqlite3 “standard” api on pyspatialite. Ok, well, let’s give it a go anyway, using Ubuntu 10.04,

Simple installation via sudo [easy_install pyspatialite|pip install pyspatialite] will fail miserably. The distribution includes the C for spatialite, and builds it’s own extension. I have no particular opinion on that either way but it feels a bit rough around the edges. Anyway, on a clean Ubuntu, you’ll need at least the following three packages, libgeos-dev libproj-dev python-dev Try and reinstall and now you should be good to go!

Ok, but now what? As there’s no real python examples, you need to follow the raw spatialite examples. I found these ok, but I wasn’t thrilled. Too much time spent with WKB and hex dumps of geometry and explaining sql.

So, here’s a couple of examples that actually use python, and some notes I found awkward. (I’m not a geo expert, and these are just some of the ways I’m doing it.)

First, actually creating some data, you can run these commands via pyspatialite, or from the command line. (For the command line tool, spatialite you can install spatialite-bin on Ubuntu 10.04, it’s not exactly the same version as comes built into pyspatialite, but it seems to work ok. It’s virtually identical to running sqlite3 from the command line)

                            name text NOT NULL, TYPE text, geometry BLOB NOT NULL);
SELECT RecoverGeometryColumn('bars', 'geometry', 4326, 'POINT', 2);

This will create a table, and create some magic behind the scenes to tell spatialite that the column named “geometry” in the “bars” table contains geometry data, which are POINTs, in the spatial reference id 4326 (good ol WGS84) POINTs could be something else, have a good read on OGC well known text stuff if you want to get into this. You can also create the geometry column by adding the column to an existing table, but I found that actually harder to understand. However, for completeness, here’s the same table as above, created the other way….

                            name text NOT NULL, TYPE text);
SELECT AddGeometryColumn('bars', 'geometry', 4326, 'POINT', 2);

Your choice. Anyway, moving on. That’s just a table, we want to put some data in…

from pyspatialite import dbapi2 as sqlite3
conn = sqlite3.connect("/home/karl/src/karltemp.sqlite")
c = conn.cursor()
# some data from somewhere....
bar = Bar(name="the pig and whistle", type="pub", lat="64.123", lon="-21.456")
c.execute("""insert into bars (name, type, geometry) 
             values (?, ?, ?, geomFromText('POINT(%f %f)', 4326))""" % (bar.lon,,
             (, bar.type))

Well, that’s not as pretty as we’d like. See how we used both ? substitution and python % substitution? geomFromText() is a function provided by spatialite for making the right sort of background magic happen to put spatial data into that blob column we defined on our table. It seems sqlite doesn’t know how to put params inside the quoted strings. Oh well. sucks to be us. But we move on and insert a bunch of data.

Now, to use it, first, selecting the nearest 10 bars.

from pyspatialite import dbapi2 as sqlite3
conn = sqlite3.connect("/home/karl/src/karltemp.sqlite")
conn.row_factory = sqlite3.Row
c = conn.cursor()
lon = -21.9385482
lat =  64.1475511
c.execute("""select, b.type, 
                              geomfromtext('point(%f %f)', 4326))
                  as distance from bars b
                  order by distance desc limit 10""" % (lon, lat))
for row in c:"bar: %s, type: %s, distance: %f", row['name'], row['type'], row['distance'])

4326 is the EPSG SRID for WGS84. Did that sentence mean nothing to you? You’re not alone. Suffice to say, if you want to work with data all over the world, and share your data with people who know even less about this, just stick with WGS84. (FIXME? Actually, after playing with this a bit more, this seems to have NO affect whatsoever for me)

Firstly, again note that we can’t use regular sql parameter escapes via the ? character. Secondly, this actually has terrible performance. The order by distance, even if limited to 10, means that it has to actually calculate the distance for all the bars in the table, then throw away all but the last 10 results.

Thirdly, the distances returned, what are those in? A little bit of probing with some fake data, and it seems that at least how I’m using it so far, this is NOT geospatial at ALL! This is purely cartesian!

spatialite> select distance(geomfromtext('point(60 60)', 4326), geomfromtext('point(59 60)', 4326));
spatialite> select distance(geomfromtext('point(60 0)', 4326), geomfromtext('point(59 0)', 4326));

I’m clearly doing something wrong. The distance between 60N 60E and 60N 59E is most definitely NOT the same distance as between 0N 60E and 0N 59E. It is however exactly 1 unit on a cartesian plane.

So, this post will have to be continued! Even if it’s only cartesian, if it does the indexes right, this could still be very useful, stay tuned, or, if you know what I’m missing, please feel very free to comment :)