[SoC] OSM cgimap GSOC 2013 report #4

Paul Norman penorman at mac.com
Sat Jul 13 01:37:50 PDT 2013


What did I get done this week?

- pulled in the changeset cache refactor from upstream

  - lots of working with git merge

- analyzed a set of logs from api.osm.org

  - 30% of map? calls return no results

  - This lead me to short-circuit map calls with no nodes, leading to a
substantial speedup. This will probably bring the most noticeable immediate
benefits because it isn't specific to pgsnapshot and can be deployed on
api.osm.org right away. It increasing speed in my benchmarks by >5%.

- Ran some benchmarks

My earlier results on linestring usage wasn't too impressive, but
short-circuiting no nodes helped drastically.

Results:
test done                        cold cache    hot cache   mem
way_nodes*, apidb, extract       33m28.449s   28m55.834s    7G
way_nodes, pgsnapshot, extract   15m13.645s   10m23.546s    6G
linestring, pgsnapshot, extract  14m22.915s   10m23.181s    4G
way_nodes, pgsnapshot, full      37m 2.924s   10m10.498s   11G
linestring, pgsnapshot, full     27m40.750s   10m25.437s  8.5G
* apidb only has way_nodes, and uses different SQL but it's most similar to
pgsnapshot way_nodes

As expected, apidb is not optimal in read-only workloads.

The most surprising result here is how little apidb gains from hot caches.
Perhaps it's doing CPU-heavy sequential scans from bad query plans due to
the small extract size? It's not something I intend to pursue at this time,
but anyone planning to use apidb on a small dataset should be aware of the
possibility of bad plans.

Next week:

I plan to see how performance is with way_nodes eliminated, using some SQL
commands to test on a small DB.

Depending on the results of this, I'll look at what it would involve to
change pgsnapshot. There's reportedly some significant inefficiencies in the
updates process that I would hopefully be able to tackle at the same time.

I also need to raise an issue on the postgres -hackers list, there is a bug
with query cost index usage fudge factors resulting in disastrous query
plans that I keep running into. It has been fixed in 9.3
(http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=31f38f28b00cb
e2b9267205359e3cf7bafa1cb97), but not backported.

If there's time at the next EWG meeting, I'm going to discuss what it would
take to deploy the version of cgimap that handles /node|way|relation/#
calls, /full calls, etc to api.osm.org

Problems in the last week

- I probably spent a day working around the fudge factors. 9.3 can't release
soon enough for me.

- I asked some questions about pgsnapshot changes at
http://lists.osm.org/pipermail/osmosis-dev/2013-July/001582.html but haven't
gotten a response. I'd really like to hear something before I go hacking at
osmosis code.

- I'd like to hear from any users of pgsnapshot who are using it for uses
other than map? queries. In particular, anyone who is doing JOINs against
way_nodes but they are joining something *other* than the nodes table, or
some temporary equivalent.

On the whole, it didn't feel like the most productive week, but I still got
stuff done.



More information about the SoC mailing list