[Benchmarking] Vector (OSM) data prep/plan

Wed Feb 23 09:47:42 EST 2011

Hi Dane,

Thanks for bringing this to the mailing list so all can follow along.

I think you hit it on the head, that teams in reality won't have time to 
spend on working on a custom OSM conversion script and worrying about 
simplifying (we've seen that in the past exercises teams have no time 
for this).  Teams barely have time to take pre-processed data and style 
it according to the guidelines.

Then couldn't we use OSM cloudmade shapefiles as the baseline test?  (or 
maybe you mentioned that and I missed that)

But that brings us to a bigger question: are we doing both a baseline 
test and a best-run test this year?

-jeff

On 11-02-23 2:27 AM, Dane Springmeyer wrote:
> For the meeting today (http://wiki.osgeo.org/wiki/Benchmarking_2011#Next_IRC_Meeting) I'd like to discuss the Vector data plan.
>
> Short version:
>
> My rough proposal is to plan for *both* a 1) "baseline" test using OSM data in a postgis database imported from cloudmade shapefiles using shp2pgsql (or some other commonly shared approach) and a 2) "best effort" in which teams are encouraged to come up with better import and storage mechanisms and the only limitation is that the styles result in visually identical rendered tiles to the baseline tiles. Both the baseline and best effort would be presented in Denver, but teams would only be mandated to provide results for the baseline (the reason being that smaller teams may not have the time or resources to complete a best effort approach).
>
> I volunteer to help process OSM data for the baseline test, depending on what people want to see.
>
> Also, below I provide additional thoughts (the long version) on why I think including a baseline test is important.
>
> Cheers,
>
> Dane
>
> ----------------
>
> As I understand from previous meetings, it was decided that OSM data for Colorado would be a good test candidate. This is great.
>
> I also understand that the approach would be "best effort" - meaning that the method of processing the OSM data into a format suitable for rendering would be up to the desires of each team.
>
> This makes a lot of sense from the perspective of the data and users. OSM's native format and postgis schema are not designed for rendering (nor is its XML/PDF dump format, aka the "planet file") and there is a wide variety of conversion and import tools for filtering it, turning nodes/ways into OGC geometries, and otherwise prepping parts of it for display. Advances made by benchmarking teams to think of great ways to utilize and optimize OSM data for rendering will benefit all the many consumers of OSM data.
>
> But my concern with this plan is that we all need to recognize the time and effort of this approach.
>
> I would assume that the various teams that seek to participate in this exercise did not sign up to write OSM ->  {some format} conversion script and this part of the exercise could end up taking a large proportion of the effort if teams see that gains can be had by filtering/simplifying/partitioning or otherwise optimizing during import rather than during rendering. I saw in the notes that no "simplification" would be allowed, but this is unrealistic because even the osm2pgsql tool used by openstreemap.org to import into postgis simplifies some geometries and puts them in low-zoom table called "planet_osm_roads". This is a good thing of course and osm2pgsql should be doing more of it. The problem however, is how we keep our results comparable if some tools simplify more than others or otherwise throw out data that other tools do not.
>
> So, I worry the plan for only doing "best effort" (vs all of us deciding on a shared way of processing and storing OSM data to be used for rendering) is dodging a key decision of how to plan a meaningful baseline test. So, I think we should both as I have mentioned above, and realistically only once a baseline test is in place will best effort tests seem reasonable._______________________________________________