[Benchmarking] Vector (OSM) data prep/plan

Pirmin Kalberer pi_ml at sourcepole.com
Wed Feb 23 06:04:43 EST 2011


Hi Dane,
Since I can't attend the meeting today, some thougts by mail. I have some 
concerns that in your "best effort" proposal we will test the data preparation 
and not the WMS performance. For me the "baseline" test is good enough and I 
support that. We already have a styling for Geofabrik shapefiles imported with 
shp2pgsl, which look similar to the Cloudmade shapefiles. As I mentioned in the 
IRC session, my favorite OSM to PostGIS mapping is OSM-in-a-box 
(http://dev.ifs.hsr.ch/redmine/projects/osminabox/wiki). Advantages are 
customizable mappings and and incremental imports. It would also allow to 
produce an import for a defined reference date, which is important for a  
reproducible benchmark.
Pirmin


Am Mittwoch, 23. Februar 2011, um 07.27:34 schrieb Dane Springmeyer:
> For the meeting today
> (http://wiki.osgeo.org/wiki/Benchmarking_2011#Next_IRC_Meeting) I'd like
> to discuss the Vector data plan.
> 
> Short version:
> 
> My rough proposal is to plan for *both* a 1) "baseline" test using OSM data
> in a postgis database imported from cloudmade shapefiles using shp2pgsql
> (or some other commonly shared approach) and a 2) "best effort" in which
> teams are encouraged to come up with better import and storage mechanisms
> and the only limitation is that the styles result in visually identical
> rendered tiles to the baseline tiles. Both the baseline and best effort
> would be presented in Denver, but teams would only be mandated to provide
> results for the baseline (the reason being that smaller teams may not have
> the time or resources to complete a best effort approach).
> 
> I volunteer to help process OSM data for the baseline test, depending on
> what people want to see.
> 
> Also, below I provide additional thoughts (the long version) on why I think
> including a baseline test is important.
> 
> Cheers,
> 
> Dane
> 
> ----------------
> 
> As I understand from previous meetings, it was decided that OSM data for
> Colorado would be a good test candidate. This is great.
> 
> I also understand that the approach would be "best effort" - meaning that
> the method of processing the OSM data into a format suitable for rendering
> would be up to the desires of each team.
> 
> This makes a lot of sense from the perspective of the data and users. OSM's
> native format and postgis schema are not designed for rendering (nor is
> its XML/PDF dump format, aka the "planet file") and there is a wide
> variety of conversion and import tools for filtering it, turning
> nodes/ways into OGC geometries, and otherwise prepping parts of it for
> display. Advances made by benchmarking teams to think of great ways to
> utilize and optimize OSM data for rendering will benefit all the many
> consumers of OSM data.
> 
> But my concern with this plan is that we all need to recognize the time and
> effort of this approach.
> 
> I would assume that the various teams that seek to participate in this
> exercise did not sign up to write OSM -> {some format} conversion script
> and this part of the exercise could end up taking a large proportion of
> the effort if teams see that gains can be had by
> filtering/simplifying/partitioning or otherwise optimizing during import
> rather than during rendering. I saw in the notes that no "simplification"
> would be allowed, but this is unrealistic because even the osm2pgsql tool
> used by openstreemap.org to import into postgis simplifies some geometries
> and puts them in low-zoom table called "planet_osm_roads". This is a good
> thing of course and osm2pgsql should be doing more of it. The problem
> however, is how we keep our results comparable if some tools simplify more
> than others or otherwise throw out data that other tools do not.
> 
> So, I worry the plan for only doing "best effort" (vs all of us deciding on
> a shared way of processing and storing OSM data to be used for rendering)
> is dodging a key decision of how to plan a meaningful baseline test. So, I
> think we should both as I have mentioned above, and realistically only
> once a baseline test is in place will best effort tests seem
> reasonable._______________________________________________ Benchmarking
> mailing list
> Benchmarking at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/benchmarking


-- 
Pirmin Kalberer
Sourcepole  -  Linux & Open Source Solutions
http://www.sourcepole.com


More information about the Benchmarking mailing list