[SoC] OSM cgimap GSOC 2013 report #4
Paul Norman
penorman at mac.com
Sat Jul 13 01:37:50 PDT 2013
What did I get done this week?
- pulled in the changeset cache refactor from upstream
- lots of working with git merge
- analyzed a set of logs from api.osm.org
- 30% of map? calls return no results
- This lead me to short-circuit map calls with no nodes, leading to a
substantial speedup. This will probably bring the most noticeable immediate
benefits because it isn't specific to pgsnapshot and can be deployed on
api.osm.org right away. It increasing speed in my benchmarks by >5%.
- Ran some benchmarks
My earlier results on linestring usage wasn't too impressive, but
short-circuiting no nodes helped drastically.
Results:
test done cold cache hot cache mem
way_nodes*, apidb, extract 33m28.449s 28m55.834s 7G
way_nodes, pgsnapshot, extract 15m13.645s 10m23.546s 6G
linestring, pgsnapshot, extract 14m22.915s 10m23.181s 4G
way_nodes, pgsnapshot, full 37m 2.924s 10m10.498s 11G
linestring, pgsnapshot, full 27m40.750s 10m25.437s 8.5G
* apidb only has way_nodes, and uses different SQL but it's most similar to
pgsnapshot way_nodes
As expected, apidb is not optimal in read-only workloads.
The most surprising result here is how little apidb gains from hot caches.
Perhaps it's doing CPU-heavy sequential scans from bad query plans due to
the small extract size? It's not something I intend to pursue at this time,
but anyone planning to use apidb on a small dataset should be aware of the
possibility of bad plans.
Next week:
I plan to see how performance is with way_nodes eliminated, using some SQL
commands to test on a small DB.
Depending on the results of this, I'll look at what it would involve to
change pgsnapshot. There's reportedly some significant inefficiencies in the
updates process that I would hopefully be able to tackle at the same time.
I also need to raise an issue on the postgres -hackers list, there is a bug
with query cost index usage fudge factors resulting in disastrous query
plans that I keep running into. It has been fixed in 9.3
(http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=31f38f28b00cb
e2b9267205359e3cf7bafa1cb97), but not backported.
If there's time at the next EWG meeting, I'm going to discuss what it would
take to deploy the version of cgimap that handles /node|way|relation/#
calls, /full calls, etc to api.osm.org
Problems in the last week
- I probably spent a day working around the fudge factors. 9.3 can't release
soon enough for me.
- I asked some questions about pgsnapshot changes at
http://lists.osm.org/pipermail/osmosis-dev/2013-July/001582.html but haven't
gotten a response. I'd really like to hear something before I go hacking at
osmosis code.
- I'd like to hear from any users of pgsnapshot who are using it for uses
other than map? queries. In particular, anyone who is doing JOINs against
way_nodes but they are joining something *other* than the nodes table, or
some temporary equivalent.
On the whole, it didn't feel like the most productive week, but I still got
stuff done.
More information about the SoC
mailing list