[gdal-dev] Fastest vector format for combining shapefiles

Simon Greener simon at spatialdbadvisor.com
Thu Oct 15 01:33:13 EDT 2009


Before ever I would advise anything I would want to see the open source GIS community clearly define that it is that it wants in a new physical file format.

Here is a discussion on the question of new open vector format that occurred two years ago (note that most of the discussion is technical - about possible solutions without defining the problem): http://www.mail-archive.com/discuss@lists.osgeo.org/msg01285.html

There is, of course, the desire to correct the problems in the old shapefile format (multiple files, no official spatial index, spatial reference external to main spatial file, column name and data type limitations in the DBase file format, limits to physical file size etc) but should the new format include render information? Also, elucidating user need could expose conflicting requirements that are difficult to resolve in a single physical file format. For example, the need to be able to store a very large dataset may conflict with the desire to have a high-performance access. What about clean interop with the IT world? (IMHO we have failed as a spatial community to embrace IT data access APIs ODBC/JDBC/OLEDB etc for the spatial data formats we have created, promoted and used.) From my perspective I can't see that the community needs new bespoke formats as there has already been so much work out there in general computing that we can re-use or extend hence any new format ca
 n be  
brought to market quickly.

Not the answer you wanted Matt but I have so little involvement in the open source community that I am not the one to "drive" this issue to completion.

regards
Simon
On Thu, 15 Oct 2009 07:10:35 +1100, Matt Wilkie <matt.wilkie at gov.yk.ca> wrote:

> Simon, it's clear you have a great deal of experience in this area. If
> it were up to you to chart out a course for an open source spatial
> vector format that trancends the current limitations of shape, gml, etc.
> what would you advise? Perhaps all that is missing is an architectural
> plan.
>> The only reason I ever hear for physical file formats is "we need to do this for performance reasons
>
> Personally I like physical files because they accessible and portable. I
> don't have to install, configure and run an application just to get to
> the point of reading it. Neither do I have to export to an intermediate
> state to transfer to another machine or medium and then import on the
> other end. Maybe this is more properly a limitation of the readily
> available tools than the storage format though.
>
> best regards,
>
> matt wilkie
> --------------------------------------------
> Geomatics Analyst
> Information Management and Technology
> Yukon Department of Environment
> 10 Burns Road * Whitehorse, Yukon * Y1A 4Y9
> 867-667-8133 Tel * 867-393-7003 Fax
> http://environmentyukon.gov.yk.ca/geomatics/
> --------------------------------------------
>
>
>
> Simon Greener wrote:
>> The need for a new vector file format has been discussed many times with no action initiated by the open source community on what to do.
>>
>>
>>> ESRI has said they will do so, but it's been several
>>> years since they first announced it and when it is finally is released
>>> there is no guarantee it will be under license terms open source
>>> projects can use. This isn't to say it will be unusable either, we
>>> just don't know.
>>>
>>
>> Ahhh, isn't it wonderful waiting for the "crumbs that will fall from the master's table"!
>>
>> And, "to wait" is part and parcel of being an ESRI-centric customer or user: strange that open source people are willing to do the same.
>> (Well, at least you aren't paying for the privilege of using the stuff.)
>>
>> Why won't ESRI release an FDO (an Open Source open access API) provider for FGDBs rather than their own API? (I can find no reference to ESRI offering to do so for any of its formats.) Sounds like API lock-in is a design goal for the FGDB API!
>>
>> Don't forget that a FGDB is full of ESRI concepts (not OGC or SQL/MM or those promoted by any other standards body) - more lock-in if it becomes the much hoped for replacement for shapefiles. And, what's more, we know nothing about what will be in the API. Where is the community engagement? Will we end up with an API via which we cannot (four examples will suffice):
>>
>> 1. Properly design (cf CASE tool) an FGDB (cf ESRI $$ extensions to Visio);
>> 2. Create an FGDB from scratch;
>> 3. Write data or create important objects (ie versions);
>> 4. Create FGDB spatial and attribute indexes or even use them via the API (cf shapefile indexing);
>>
>> These are points which have grounding in past ESRI practices. All done deliberately so you have to have a copy of ArcGIS to construct, design and get the best out of a fully specified FGDB?
>>
>> And then, when there are serious bugs, you have to wait for 18 months for a fix while in the Open Source community you could get one in a matter of days or weeks?
>>
>> Seriously, though, isn't open source about taking control of one's destiny and being a part of a truly open, collaborative, process and not waiting for the bully in the playground to tell you what you can and can't do, or who really isn't interested in your deadlines and real needs? Many times in my long GIS career I've had conversations with the 'true believers' over in Redlands. One was like this: "When will you support an Oracle Sdo_Geometry circle object in ArcSDE?". Reply: "Circles in GIS? We don't think you need them....".
>>
>> The fixated concentration of the GIS community on physical file formats feels very much like a 1960s form of data management and computing. Logical separation from physical implementation, normalise for edit/denormalise for performance, logical separation from physical implementation, normalise for edit/denormalise for performance, logical..... oops the record is broken ....
>>
>> The only reason I ever hear for physical file formats is "we need to do this for performance reasons"..... but this usually masks a lot of other reasons and assumptions (like it is "quicker and easier" that soon morphs into a management nightmare).....
>>
>> cynically
>> Simon
>>
>


-- 
SpatialDB Advice and Design, Solutions Architecture and Programming,
Oracle Database 10g Administrator Certified Associate; Oracle Database 10g SQL Certified Professional
Oracle Spatial, SQL Server, PostGIS, MySQL, ArcSDE, Manifold GIS, FME, Radius Topology and Studio Specialist.
39 Cliff View Drive, Allens Rivulet, 7150, Tasmania, Australia.
Website: www.spatialdbadvisor.com
   Email: simon at spatialdbadvisor.com
   Voice: +61 362 396397
Mobile: +61 418 396391
Skype: sggreener
Longitude: 147.20515 (147° 12' 18" E)
Latitude: -43.01530 (43° 00' 55" S)
GeoHash: r22em9r98wg
NAC:W80CK 7SWP3


More information about the gdal-dev mailing list