Category Archives: geospatial

pyspatialite – spatial queries in python, built on sqlite3

I’ve been doing some work with spatial queries recently, and I really didn’t feel like setting up a full blown PostGIS setup, which is the “traditional” way, nor did I really feel like going and trying out these trendy NoSQL stores, like CouchDb and MongoDb, both of which apparently now support spatial work. Then I heard about pyspatialite.

I like python, and sqlite is lovely all those times you want to stuff a bunch of data somewhere, and not have to think too much about it. But, pyspatialite has virtually zero documentation, other than the hint that’s basically just a sqlite3 “standard” api on pyspatialite. Ok, well, let’s give it a go anyway, using Ubuntu 10.04,

Simple installation via sudo [easy_install pyspatialite|pip install pyspatialite] will fail miserably. The distribution includes the C for spatialite, and builds it’s own extension. I have no particular opinion on that either way but it feels a bit rough around the edges. Anyway, on a clean Ubuntu, you’ll need at least the following three packages, libgeos-dev libproj-dev python-dev Try and reinstall and now you should be good to go!

Ok, but now what? As there’s no real python examples, you need to follow the raw spatialite examples. I found these ok, but I wasn’t thrilled. Too much time spent with WKB and hex dumps of geometry and explaining sql.

So, here’s a couple of examples that actually use python, and some notes I found awkward. (I’m not a geo expert, and these are just some of the ways I’m doing it.)

First, actually creating some data, you can run these commands via pyspatialite, or from the command line. (For the command line tool, spatialite you can install spatialite-bin on Ubuntu 10.04, it’s not exactly the same version as comes built into pyspatialite, but it seems to work ok. It’s virtually identical to running sqlite3 from the command line)

CREATE TABLE bars(PKUID INTEGER PRIMARY KEY autoincrement,
                            name text NOT NULL, TYPE text, geometry BLOB NOT NULL);
SELECT RecoverGeometryColumn('bars', 'geometry', 4326, 'POINT', 2);

This will create a table, and create some magic behind the scenes to tell spatialite that the column named “geometry” in the “bars” table contains geometry data, which are POINTs, in the spatial reference id 4326 (good ol WGS84) POINTs could be something else, have a good read on OGC well known text stuff if you want to get into this. You can also create the geometry column by adding the column to an existing table, but I found that actually harder to understand. However, for completeness, here’s the same table as above, created the other way….

CREATE TABLE bars(PKUID INTEGER PRIMARY KEY autoincrement,
                            name text NOT NULL, TYPE text);
SELECT AddGeometryColumn('bars', 'geometry', 4326, 'POINT', 2);

Your choice. Anyway, moving on. That’s just a table, we want to put some data in…

from pyspatialite import dbapi2 as sqlite3
 
conn = sqlite3.connect("/home/karl/src/karltemp.sqlite")
c = conn.cursor()
# some data from somewhere....
bar = Bar(name="the pig and whistle", type="pub", lat="64.123", lon="-21.456")
c.execute("""insert into bars (name, type, geometry) 
             values (?, ?, ?, geomFromText('POINT(%f %f)', 4326))""" % (bar.lon, bar.lat),
             (bar.name, bar.type))

Well, that’s not as pretty as we’d like. See how we used both ? substitution and python % substitution? geomFromText() is a function provided by spatialite for making the right sort of background magic happen to put spatial data into that blob column we defined on our table. It seems sqlite doesn’t know how to put params inside the quoted strings. Oh well. sucks to be us. But we move on and insert a bunch of data.

Now, to use it, first, selecting the nearest 10 bars.

from pyspatialite import dbapi2 as sqlite3
 
conn = sqlite3.connect("/home/karl/src/karltemp.sqlite")
conn.row_factory = sqlite3.Row
c = conn.cursor()
lon = -21.9385482
lat =  64.1475511
c.execute("""select b.name, b.type, 
                  distance(b.geometry, 
                              geomfromtext('point(%f %f)', 4326))
                  as distance from bars b
                  order by distance desc limit 10""" % (lon, lat))
for row in c:
    log.info("bar: %s, type: %s, distance: %f", row['name'], row['type'], row['distance'])

4326 is the EPSG SRID for WGS84. Did that sentence mean nothing to you? You’re not alone. Suffice to say, if you want to work with data all over the world, and share your data with people who know even less about this, just stick with WGS84. (FIXME? Actually, after playing with this a bit more, this seems to have NO affect whatsoever for me)

Firstly, again note that we can’t use regular sql parameter escapes via the ? character. Secondly, this actually has terrible performance. The order by distance, even if limited to 10, means that it has to actually calculate the distance for all the bars in the table, then throw away all but the last 10 results.

Thirdly, the distances returned, what are those in? A little bit of probing with some fake data, and it seems that at least how I’m using it so far, this is NOT geospatial at ALL! This is purely cartesian!

spatialite> select distance(geomfromtext('point(60 60)', 4326), geomfromtext('point(59 60)', 4326));
1.0
spatialite> select distance(geomfromtext('point(60 0)', 4326), geomfromtext('point(59 0)', 4326));
1.0
spatialite>

I’m clearly doing something wrong. The distance between 60N 60E and 60N 59E is most definitely NOT the same distance as between 0N 60E and 0N 59E. It is however exactly 1 unit on a cartesian plane.

So, this post will have to be continued! Even if it’s only cartesian, if it does the indexes right, this could still be very useful, stay tuned, or, if you know what I’m missing, please feel very free to comment :)