[OSGeo-Discuss] idea for an OSGeo project -- a new, open data format

P Kishor punk.kish at gmail.com
Tue Nov 13 10:06:39 PST 2007

On 11/13/07, Frank Warmerdam <warmerdam at pobox.com> wrote:
> P Kishor wrote:
> > So, I am thinking, Shapefile is the de facto data standard for GIS
> > data. That it is open (albeit not Free) along with the deep and wide
> > presence of ESRI's products from the beginning of the epoch, it has
> > been widely adopted. Existence of shapelib, various language bindings,
> > and ready use by products such as MapServer has continued to cement
> > Shapefile as the format to use. All this is in spite of Shapefile's
> > inherent drawbacks, particularly in the area of attribute data
> > management.
> >
> > What if we came up with a new and improved data format -- call it
> > "Open Shapefile" (extension .osh) -- that would be completely Free,
> > single-file based (instead of the multiple .shp, .dbf, .shx, etc.),
> > and based on SQLite, giving the .osh format complete relational data
> > handling capabilities. We would require a new version of Shapelib,
> > improved language bindings, make it the default and preferred format
> > for MapServer, and provide seamless and painless import of regular
> > .shp data into .osh for native rendering. Its adoption would be quick
> > in the open source community. The non-opensource community would
> > either not give a rat's behind for it, but it wouldn't affect them...
> > they would still work with their preferred .shp until they learned
> > better. By having a completely open and Free single-file based, built
> > on SQLite, fully relational dbms capable spatial data format, it would
> > be positioned for continued improvement and development.
> Puneet,
> I've had a similar idea kicking around in my head for a while, but I think
> of it as "open geodatabase".  I see the goals as providing a similar role
> to the "personal geodatabase", including:

I should mention that over on the SQLite list every once a eon someone
asks the question about spatlializing SQLite a la PostGIS. That
definitely could be one way to go as I have a little experience in

Last year I had the opportunity to tackle a point-in-polygon overlay
problem that was seeming to be intractable for ArcGIS. So, I dumped
the data into Shapefile, unpacked the coordinates from the Shapefile
and stuffed them into SQLite, then used the bounding box method to
narrow my searches. With the clever use of indexes, and a bunch of
optimizations, I basically wrote a fast and functional overlay program
with Perl/DBI. The overlay task now takes 2 days, which is a huge
improvement from the earlier 7-8 days that it used to be.

One thing this new "geo database" should not be is that like SQLite it
should not require a server. As wonderful as PostGIS is, installing
and managing PostGres is a major obstacle to its use. I use SQLite for
pretty much everything because of the ease with which I can get
started with it. It takes me longer to download it than to start
working with it on a new machine.

As Larry Wall likes to say, "you can write faster programs in C, but
you can program faster in Perl." It is kinda like that... on the one
hand, highlighting the "database-ness" of the new format would be a
good and powerful thing, but on the other hand, it might lead folks to
think that a db server is required.

Having a server-less, self-contained, rdbms-capable format would be the key.

>   o RDBMS style operations like SQL filtering, joins, etc.
>   o Get past all the shapefile limitations related to the .dbf format (very
>     restricted data types, short attribute names, lots of other limits)
>   o Allow storing many layers in one file.
>   o Built in spatial indexing and attribute indexing.
>   o OGC style coordinate system and geometry support.
> I have had some hope that the existing SDF format supported by FDO would
> be this new format; however, SDF is quite a complicated format, and the
> only available open source implementation is quite heavily tied to FDO.
> Once you carry along FDO the whole thing becomes fairly heavy in terms of
> the amount of code required, and the interface complexity.  But (I think)
> it satisfies most of my goals and already exists.
> I do feel that we need to be cautious before launching "yet another format".
> I'm also a bit dubious about some aspects of sqlite as a native data store.
> In particular, it's typeless "everything is a string" approach strikes me
> as potentially being a problem.  It also remains to be seen whether we could
> build fast spatial indexing directly in, though I suppose with a fat enough
> middleware layer it could be done.
> PS. I'm still doubtful it would be faster than shapefiles+qix for most web
> mapping needs.

We will never know until it is done. And, isn't premature optimization
the root of all evil?

> Best regards,
> --
> ---------------------------------------+--------------------------------------
> I set the clouds in motion - turn up   | Frank Warmerdam, warmerdam at pobox.com
> light and sound - activate the windows | http://pobox.com/~warmerdam
> and watch the world go round - Rush    | President OSGeo, http://osgeo.org

More information about the Discuss mailing list