[Benchmarking] Vector (OSM) data prep/plan

Dane Springmeyer dane at dbsgeo.com
Wed Feb 23 10:11:47 EST 2011


Great Though Mike,

I would say that if all practical efforts are made to keep the data identical, despite the database change, I would call that baseline.

Dane

On Feb 23, 2011, at 5:03 AM, Smith, Michael ERDC-CRREL-NH wrote:

> A further thought. There are some in this (Oracle MapViewer leaps to mind)
> that won't be pulling form PostGis but rather from Oracle. But the actual
> geometries being rendered should be the same (eg, export the data from
> PostGIS to Oracle) to keep and Apples to Apples comparison. I think Best
> Effort, could be the storage format that your team feels is most optimal for
> use but the data (geometry) itself should be identical (or as close as is
> possible). And for those teams that don't do PostGis, would Oracle be
> considered BaseLine or Best Effort? I guess I'm still a little unclear (as
> this email demonstrates) where one begins and the other ends.
> 
> Mike
> 
> 
> -- 
> Michael Smith
> Remote Sensing/GIS Center
> US Army Corps of Engineers
> 
> 
> On 2/23/11 6:04 AM, "Pirmin Kalberer" <pi_ml at sourcepole.com> wrote:
> 
>> Hi Dane,
>> Since I can't attend the meeting today, some thougts by mail. I have some
>> concerns that in your "best effort" proposal we will test the data preparation
>> and not the WMS performance. For me the "baseline" test is good enough and I
>> support that. We already have a styling for Geofabrik shapefiles imported with
>> shp2pgsl, which look similar to the Cloudmade shapefiles. As I mentioned in
>> the 
>> IRC session, my favorite OSM to PostGIS mapping is OSM-in-a-box
>> (http://dev.ifs.hsr.ch/redmine/projects/osminabox/wiki). Advantages are
>> customizable mappings and and incremental imports. It would also allow to
>> produce an import for a defined reference date, which is important for a
>> reproducible benchmark.
>> Pirmin
>> 
>> 
>> Am Mittwoch, 23. Februar 2011, um 07.27:34 schrieb Dane Springmeyer:
>>> For the meeting today
>>> (http://wiki.osgeo.org/wiki/Benchmarking_2011#Next_IRC_Meeting) I'd like
>>> to discuss the Vector data plan.
>>> 
>>> Short version:
>>> 
>>> My rough proposal is to plan for *both* a 1) "baseline" test using OSM data
>>> in a postgis database imported from cloudmade shapefiles using shp2pgsql
>>> (or some other commonly shared approach) and a 2) "best effort" in which
>>> teams are encouraged to come up with better import and storage mechanisms
>>> and the only limitation is that the styles result in visually identical
>>> rendered tiles to the baseline tiles. Both the baseline and best effort
>>> would be presented in Denver, but teams would only be mandated to provide
>>> results for the baseline (the reason being that smaller teams may not have
>>> the time or resources to complete a best effort approach).
>>> 
>>> I volunteer to help process OSM data for the baseline test, depending on
>>> what people want to see.
>>> 
>>> Also, below I provide additional thoughts (the long version) on why I think
>>> including a baseline test is important.
>>> 
>>> Cheers,
>>> 
>>> Dane
>>> 
>>> ----------------
>>> 
>>> As I understand from previous meetings, it was decided that OSM data for
>>> Colorado would be a good test candidate. This is great.
>>> 
>>> I also understand that the approach would be "best effort" - meaning that
>>> the method of processing the OSM data into a format suitable for rendering
>>> would be up to the desires of each team.
>>> 
>>> This makes a lot of sense from the perspective of the data and users. OSM's
>>> native format and postgis schema are not designed for rendering (nor is
>>> its XML/PDF dump format, aka the "planet file") and there is a wide
>>> variety of conversion and import tools for filtering it, turning
>>> nodes/ways into OGC geometries, and otherwise prepping parts of it for
>>> display. Advances made by benchmarking teams to think of great ways to
>>> utilize and optimize OSM data for rendering will benefit all the many
>>> consumers of OSM data.
>>> 
>>> But my concern with this plan is that we all need to recognize the time and
>>> effort of this approach.
>>> 
>>> I would assume that the various teams that seek to participate in this
>>> exercise did not sign up to write OSM -> {some format} conversion script
>>> and this part of the exercise could end up taking a large proportion of
>>> the effort if teams see that gains can be had by
>>> filtering/simplifying/partitioning or otherwise optimizing during import
>>> rather than during rendering. I saw in the notes that no "simplification"
>>> would be allowed, but this is unrealistic because even the osm2pgsql tool
>>> used by openstreemap.org to import into postgis simplifies some geometries
>>> and puts them in low-zoom table called "planet_osm_roads". This is a good
>>> thing of course and osm2pgsql should be doing more of it. The problem
>>> however, is how we keep our results comparable if some tools simplify more
>>> than others or otherwise throw out data that other tools do not.
>>> 
>>> So, I worry the plan for only doing "best effort" (vs all of us deciding on
>>> a shared way of processing and storing OSM data to be used for rendering)
>>> is dodging a key decision of how to plan a meaningful baseline test. So, I
>>> think we should both as I have mentioned above, and realistically only
>>> once a baseline test is in place will best effort tests seem
>>> reasonable._______________________________________________ Benchmarking
>>> mailing list
>>> Benchmarking at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/benchmarking
>> 
> 
> _______________________________________________
> Benchmarking mailing list
> Benchmarking at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/benchmarking



More information about the Benchmarking mailing list