[Benchmarking] Vector (OSM) data prep/plan

Dane Springmeyer dane at dbsgeo.com
Wed Feb 23 10:09:21 EST 2011


On Feb 23, 2011, at 3:04 AM, Pirmin Kalberer wrote:

> Hi Dane,
> Since I can't attend the meeting today, some thougts by mail. I have some 
> concerns that in your "best effort" proposal we will test the data preparation 
> and not the WMS performance.

Exactly my point. I was responding to my understanding that in previous meetings only a best effort approach was discussed.

So, I think you and I are on the same page - that a baseline test is critical, and likely good enough. But, if others are keen to experiment with different storage I'm supportive.

> For me the "baseline" test is good enough and I 
> support that. We already have a styling for Geofabrik shapefiles imported with 
> shp2pgsl, which look similar to the Cloudmade shapefiles. As I mentioned in the 
> IRC session, my favorite OSM to PostGIS mapping is OSM-in-a-box 
> (http://dev.ifs.hsr.ch/redmine/projects/osminabox/wiki). Advantages are 
> customizable mappings and and incremental imports. It would also allow to 
> produce an import for a defined reference date, which is important for a  
> reproducible benchmark.
> Pirmin
> 
> 
> Am Mittwoch, 23. Februar 2011, um 07.27:34 schrieb Dane Springmeyer:
>> For the meeting today
>> (http://wiki.osgeo.org/wiki/Benchmarking_2011#Next_IRC_Meeting) I'd like
>> to discuss the Vector data plan.
>> 
>> Short version:
>> 
>> My rough proposal is to plan for *both* a 1) "baseline" test using OSM data
>> in a postgis database imported from cloudmade shapefiles using shp2pgsql
>> (or some other commonly shared approach) and a 2) "best effort" in which
>> teams are encouraged to come up with better import and storage mechanisms
>> and the only limitation is that the styles result in visually identical
>> rendered tiles to the baseline tiles. Both the baseline and best effort
>> would be presented in Denver, but teams would only be mandated to provide
>> results for the baseline (the reason being that smaller teams may not have
>> the time or resources to complete a best effort approach).
>> 
>> I volunteer to help process OSM data for the baseline test, depending on
>> what people want to see.
>> 
>> Also, below I provide additional thoughts (the long version) on why I think
>> including a baseline test is important.
>> 
>> Cheers,
>> 
>> Dane
>> 
>> ----------------
>> 
>> As I understand from previous meetings, it was decided that OSM data for
>> Colorado would be a good test candidate. This is great.
>> 
>> I also understand that the approach would be "best effort" - meaning that
>> the method of processing the OSM data into a format suitable for rendering
>> would be up to the desires of each team.
>> 
>> This makes a lot of sense from the perspective of the data and users. OSM's
>> native format and postgis schema are not designed for rendering (nor is
>> its XML/PDF dump format, aka the "planet file") and there is a wide
>> variety of conversion and import tools for filtering it, turning
>> nodes/ways into OGC geometries, and otherwise prepping parts of it for
>> display. Advances made by benchmarking teams to think of great ways to
>> utilize and optimize OSM data for rendering will benefit all the many
>> consumers of OSM data.
>> 
>> But my concern with this plan is that we all need to recognize the time and
>> effort of this approach.
>> 
>> I would assume that the various teams that seek to participate in this
>> exercise did not sign up to write OSM -> {some format} conversion script
>> and this part of the exercise could end up taking a large proportion of
>> the effort if teams see that gains can be had by
>> filtering/simplifying/partitioning or otherwise optimizing during import
>> rather than during rendering. I saw in the notes that no "simplification"
>> would be allowed, but this is unrealistic because even the osm2pgsql tool
>> used by openstreemap.org to import into postgis simplifies some geometries
>> and puts them in low-zoom table called "planet_osm_roads". This is a good
>> thing of course and osm2pgsql should be doing more of it. The problem
>> however, is how we keep our results comparable if some tools simplify more
>> than others or otherwise throw out data that other tools do not.
>> 
>> So, I worry the plan for only doing "best effort" (vs all of us deciding on
>> a shared way of processing and storing OSM data to be used for rendering)
>> is dodging a key decision of how to plan a meaningful baseline test. So, I
>> think we should both as I have mentioned above, and realistically only
>> once a baseline test is in place will best effort tests seem
>> reasonable._______________________________________________ Benchmarking
>> mailing list
>> Benchmarking at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/benchmarking
> 
> 
> -- 
> Pirmin Kalberer
> Sourcepole  -  Linux & Open Source Solutions
> http://www.sourcepole.com



More information about the Benchmarking mailing list