[gdal-dev] SQLite driver hit and miss with spatial index

Frederik Ramm frederik at remote.org
Thu Apr 6 02:47:15 PDT 2017


Dear gdal folks,

TL;DR: ogrinfo uses an SQLite spatial index but ogr2ogr doesn't?

I'm debugging a performance issue with my "OSM Inspector" web site,
where I use a large number of SQLite files in Mapserver, and Mapserver
takes very long to pick the data it wants to display.

I have managed to boil the issue down to a test case where I have one
single SQLite file with one POINT geometry table. Without an index, the
file has 2.8 GB:

-rw-r--r-- 1 fred fred 2812314624 Apr  6 09:01 no-index.sqlite

The ~ 25 million points are spread across central Europe:

--------------------
% ogrinfo no-index.sqlite -al -so
INFO: Open of `no-index.sqlite'
      using driver `SQLite' successful.

Layer name: osmi_addresses_nodes_with_addresses
Geometry: Point
Feature Count: 25201411
Extent: (5.866213, 45.824860) - (24.162596, 55.115208)
--------------------

Then I created an index using ogr2ogr:

% ogr2ogr -f SQLite -dsco SPATIALITE=YES with-index.sqlite no-index.sqlite

resulting in a 5 GB file:

-rw-r--r-- 1 fred fred 5417050112 Apr  6 09:23 with-index.sqlite

(I also tried the spatialite command line and SELECT
CreateSpatialIndex(...) which yields the same result.)

My problem is that the spatial index seems not to be used by Mapserver,
and indeed not by ogr2ogr either. On the test machine which uses NVMe
disks, it takes about 35 seconds to extract a small subset using a
spatial query with ogr2ogr from the un-indexed file:

% time ogr2ogr -overwrite /tmp/blub.shp -spat 8.1 48.1 8.11 49.11
no-index.sqlite
real    0m38.226s

And it takes slightly *longer* when I use the indexed file but this
might be just measurement:

% time ogr2ogr -overwrite /tmp/blub.shp -spat 8.1 48.1 8.11 49.11
with-index.sqlite
real	0m41.575s

Obvoiusly, the index is not being used, or being used in a way that
yields zero performance gain. However, when I query my files with
ogrinfo, I see a noticeable difference between using and not using the
index:

--------------------
% time ogrinfo no-index.sqlite -spat 8.1 48.1 8.11 49.11 -al -so
INFO: Open of `no-index.sqlite'
      using driver `SQLite' successful.

Layer name: osmi_addresses_nodes_with_addresses
Geometry: Point
Feature Count: 2245
Extent: (5.866213, 45.824860) - (24.162596, 55.115208)
...
real    1m16.497s
--------------------
% time ogrinfo with-index.sqlite -spat 8.1 48.1 8.11 49.11 -al -so
INFO: Open of `with-index.sqlite'
      using driver `SQLite' successful.

Layer name: osmi_addresses_nodes_with_addresses
Geometry: Point
Feature Count: 2245
Extent: (5.866213, 45.824860) - (24.162596, 55.115208)
...
real    0m6.519s
--------------------

Even though the indexed version still needs 6 seconds, the line "Feature
Count: ..." appears practically instantly with the indexed version,
whereas it takes quite a while with the non-indexed.

Does anybody have an idea why this is so, and more importantly, how I
could get ogr2ogr to speed up (and with that, I should hope, also
Mapserver)?

(My Mapserver also uses a shapefile as a TILEINDEX telling it which
sqlite file to use for a specific request, but I think even if that adds
a performance penalty, it's not my main problem at the moment.)

(I'm also happy to hear recommendations of the sort "don't use SQLite at
all then, use <X> instead". Using shapefiles would be difficult due to
the 2 GB limit, and using PostGIS difficult because I'd spend a lot of
time importing newly generated data into PostGIS but both are not
totally ruled out.)

My environment is a run-off-the-mill Ubuntu 16.04 system with the
following versions:

GDAL 1.11.3
Spatialite 4.3.0
Mapserver 7.0.0

My test databases are available as

http://www.remote.org/frederik/tmp/no-index.sqlite.gz
http://www.remote.org/frederik/tmp/with-index.sqlite.gz

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frederik at remote.org  ##  N49°00'09" E008°23'33"


More information about the gdal-dev mailing list