[OSGeo-Discuss] Re: idea for an OSGeo project -- a new, open data format

David William Bitner david.bitner at gmail.com
Tue Nov 13 10:19:53 PST 2007


Part of the mission of the OSGeo Geodata committee (
http://www.osgeo.org/geodata) is to "promote the use of open geospatial
formats".  If there is a group that wants to continue pursuing the creation
of a new open geodata format, I would like to encourage the use of the
geodata mailing list. That being said, I think part of the discussion that
needs to be had is whether or not OSGeo should be creating standards in the
first place.

A couple comments that I have on some of the discussion that has taken place
in this thread:

Regarding the suggestion that MapServer takes on this new format as the
primary format:  I think this is way beyond the scope of what OSGeo should
be doing.  Even if we spec a new standard, we (OSGeo) have no teeth to be
able to make any of our projects do any kind of implementation of that
standard.  The choice of formats that are used by any of our projects is
driven by the needs of the users and developers and the resources (time,
money) that have been dedicated towards implementing them.  If someone takes
OpenShape or whatever and decides they have a business need that they can
spend the time or money to get it implemented then it will be implemented.
Shapefile has and will continue to be an important format for many projects
as it is one of, if not the most distributed formats in the GIS world.

Regarding the comments on standards wanking:  Standards can get in the way
of progress along a straight line, but they can also encourage
interoperability that can create better progress for everyone.  To get a
singular task done, standards often can slow things down, but there *are*
gains to be had from playing well with everyone else.

David Bitner
OSGeo, Public Geospatial Data Project Chair
On Nov 13, 2007 11:40 AM, Allan Doyle <afdoyle at mit.edu> wrote:

>
> On Nov 13, 2007, at 12:24 , Steve Coast wrote:
>
> > OSM: $0
> > CCBYSA: $0
> > Donation of entire Netherlands: Priceless
> >
> > Real artists ship. For everyone else there's standards wanking.
>
> Perhaps there's an art to wanking standards as well.
>
> >
> >
> >
> >
> > Seriously though, this is so kafka-esque. When OSM started it was
> > like this: We should have got a committee to design a standard, then
> > we could think about a committee to design an ontology... and choose
> > a name... and on some sunny distant day make a map.
> >
> >
> >
> > On 13 Nov 2007, at 17:09, P Kishor wrote:
> >
> >> On 11/13/07, Landon Blake <lblake at ksninc.com> wrote:
> >>> Puneet,
> >>>
> >>> You wrote: "Should be easy to transition to. By building the new
> >>> format
> >>> on the
> >>> structure of the Shapefile format, and *in fact*, calling it "open
> >>> shapefiles" or some such thing, we indicate from its name that the
> >>> transition is not that revolutionary but is evolutionary. This,
> >>> hopefully, will bring some name-familiarity, and make the transition
> >>> less scary."
> >>>
> >>> I really think you are going to run into problems using the
> >>> "Shapefile"
> >>> as part of the trademark or name for any product not sold by ESRI. I
> >>> strongly recommend against this move. Let people adopt the
> >>> implementation of your idea for its merits, not for name recognition
> >>> that comes from another product line.
> >>
> >> Good enough point to keep in mind, but not to get hung up over enough
> >> to entangle us. Suggestions for names of the data format can be a
> >> project in itself. "open spatial data format" or its variations could
> >> be chosen. Still, point taken.
> >>
> >>>
> >>> You wrote: "ANSI standard C is still
> >>> that magic common denominator that compiles and works predictably on
> >>> most number of systems. I have a lot against Java, but those who
> >>> love
> >>> Java should definitely work on tools for accessing and working with
> >>> this new format as it would only make the format more widely used
> >>> and
> >>> adopted."
> >>>
> >>> It sounds to me like you are really describing a tool. File
> >>> formats are
> >>> written in a binary encoding or text, not in a programming
> >>> language. If
> >>> you are designing a tool you can choose the programming language
> >>> of your
> >>> choice, but be aware that this will limit the developers that
> >>> adopt the
> >>> tool. This will be the case no matter what language you choose to
> >>> use,
> >>> whether it is C, Java, or something else.
> >>>
> >>> If, in contrast, you are creating a file format, then programming
> >>> languages shouldn't really matter. Binary and text data can be
> >>> accessed
> >>> by almost all programming languages.
> >>>
> >>> I think you need to decide if you want a tool or a data format. It
> >>> sounds like you are shooting more for a spatial database written
> >>> in the
> >>> C programming language that uses some form of the ESRI Shapefile
> >>> as its
> >>> underlying data storage mechanism. To me that is a tool or piece of
> >>> software, not a format. But maybe I don't completely understand your
> >>> goal.
> >>>
> >>
> >> well, I am, frankly confused.
> >>
> >> I was quite convinced I wasn't describing a "tool" but was describing
> >> a "format." Of course, to describe the format, I positioned it on the
> >> "format" (the SQLite-compatible format) used and popularized by a
> >> "tool" (SQLite, the library, which happens to be written in C). In my
> >> mind, having the data format based on SQLite *format* for its
> >> relational attribute handling was the real winner. In that sense,
> >> perhaps I conflated the format and the tool. I am not well versed in
> >> these things to I am probably already walking on thin ice, but that
> >> shouldn't stop others.
> >>
> >> So, forget that I mentioned C and Java... let's just concentrate on a
> >> way of laying out data on a disk that is not too dissimilar from how
> >> Shapefile data are laid out, except that we utilize the
> >> SQLite-compatible binary format for relational data handling, so that
> >> SQLite-enabled spatial tools can access this new format.
> >>
> >> And, put this format into public domain.
> >>
> >>
> >>>
> >>> -----Original Message-----
> >>> From: discuss-bounces at lists.osgeo.org
> >>> [mailto:discuss-bounces at lists.osgeo.org] On Behalf Of P Kishor
> >>> Sent: Tuesday, November 13, 2007 8:35 AM
> >>> To: OSGeo Discussions
> >>> Subject: [OSGeo-Discuss] Re: idea for an OSGeo project -- a new,open
> >>> data format
> >>>
> >>> Thanks everyone, for responding. Here is my "groundwork."
> >>>
> >>> The new format --
> >>>
> >>> - Should be fast. SQLite is plenty fast, and anything that simply
> >>> "extends" the Shapefile format to inject relational capabilities
> >>> should be pretty fast. It should definitely be faster than a
> >>> geodatabase format (such as PostGIS/ArcSDE) and perhaps even faster
> >>> than Shapefiles especially while accessing attribute data. DBF is
> >>> sequential, and searching for textual information is particularly
> >>> expensive. SQLite has been tuned to excellence. I have been working
> >>> with it for a few years now, and it really is an amazing product,
> >>> development community, support, and capabilities. That it is in
> >>> public
> >>> domain makes for a transfat-free icing on the cake.
> >>>
> >>> - Should be unencumbered by licenses and copyrights. Ideally, the
> >>> new
> >>> format could also be put back into public domain. We want to remove
> >>> all encumbrances to encourage rapid and wide adoption.
> >>>
> >>> - Should be a single file. Well, some like multiple files and some
> >>> like single files. We can achieve both objectives by using a
> >>> tar-gzipped packaging such as Apple tends to use for much of its
> >>> stuff
> >>> (for example, its Pages wordprocessor uses a tgzipped xml file along
> >>> with other resources for icons and pictures and stuff). Or, if speed
> >>> is going to be affected because of gzipping and gunzipping, just a
> >>> package format (I have no idea if this is a Unix thing or a Mac OS
> >>> thing -- we, in the Mac world, call them packages... they appear
> >>> like
> >>> files in the Finder, and like directories in the shell).
> >>>
> >>> - Should be easy to transition to. By building the new format on the
> >>> structure of the Shapefile format, and *in fact*, calling it "open
> >>> shapefiles" or some such thing, we indicate from its name that the
> >>> transition is not that revolutionary but is evolutionary. This,
> >>> hopefully, will bring some name-familiarity, and make the transition
> >>> less scary.
> >>>
> >>> - Frank mentions SQLite's lack of datatypes as an issue -- I guess
> >>> that is a matter of preference. I personally quite like that freedom
> >>> as it gives me, the application developer, complete control over
> >>> what
> >>> goes where. SQLite actually does have now a few datatypes that it
> >>> respects, but doesn't complain about. Since all users will be
> >>> accessing the data via an application, as long as the application is
> >>> well defined, it should be fine.
> >>>
> >>> - SQLite excels at one thing that it has been entrusted to do --
> >>> retrieve data that it has been entrusted with at extremely fast
> >>> speeds, and maintain ACID data integrity in case of a programmatic
> >>> catastrophe. The transactions themselves are worth their price of
> >>> admission, which, happily, happens to be zero.
> >>>
> >>> - Langdon mentions Java support -- well, yes, use/work on SQLite
> >>> JDBC.
> >>> I have been using it for a few days now and find it to be a pretty
> >>> competent conduit. Extend it, spatialize it. ANSI standard C is
> >>> still
> >>> that magic common denominator that compiles and works predictably on
> >>> most number of systems. I have a lot against Java, but those who
> >>> love
> >>> Java should definitely work on tools for accessing and working with
> >>> this new format as it would only make the format more widely used
> >>> and
> >>> adopted.
> >>>
> >>> Ok, enough for now.
> >>>
> >>>
> >>>
> >>> On Nov 13, 2007 8:52 AM, P Kishor <punk.kish at gmail.com> wrote:
> >>>> So, I am thinking, Shapefile is the de facto data standard for GIS
> >>>> data. That it is open (albeit not Free) along with the deep and
> >>>> wide
> >>>> presence of ESRI's products from the beginning of the epoch, it has
> >>>> been widely adopted. Existence of shapelib, various language
> >>>> bindings,
> >>>> and ready use by products such as MapServer has continued to cement
> >>>> Shapefile as the format to use. All this is in spite of Shapefile's
> >>>> inherent drawbacks, particularly in the area of attribute data
> >>>> management.
> >>>>
> >>>> What if we came up with a new and improved data format -- call it
> >>>> "Open Shapefile" (extension .osh) -- that would be completely Free,
> >>>> single-file based (instead of the multiple .shp, .dbf, .shx, etc.),
> >>>> and based on SQLite, giving the .osh format complete relational
> >>>> data
> >>>> handling capabilities. We would require a new version of Shapelib,
> >>>> improved language bindings, make it the default and preferred
> >>>> format
> >>>> for MapServer, and provide seamless and painless import of regular
> >>>> .shp data into .osh for native rendering. Its adoption would be
> >>>> quick
> >>>> in the open source community. The non-opensource community would
> >>>> either not give a rat's behind for it, but it wouldn't affect
> >>>> them...
> >>>> they would still work with their preferred .shp until they learned
> >>>> better. By having a completely open and Free single-file based,
> >>>> built
> >>>> on SQLite, fully relational dbms capable spatial data format, it
> >>>> would
> >>>> be positioned for continued improvement and development.
> >>>>
> >>>> Is this too crazy?
> >>>>
> >>>> --
> >>>> Puneet Kishor
> >>>>
> >>> _______________________________________________
> >>> Discuss mailing list
> >>> Discuss at lists.osgeo.org
> >>> http://lists.osgeo.org/mailman/listinfo/discuss
> >>>
> >>>
> >>> Warning:
> >>> Information provided via electronic media is not guaranteed
> >>> against defects including translation and transmission errors. If
> >>> the reader is not the intended recipient, you are hereby notified
> >>> that any dissemination, distribution or copying of this
> >>> communication is strictly prohibited. If you have received this
> >>> information in error, please notify the sender immediately.
> >>> _______________________________________________
> >>> Discuss mailing list
> >>> Discuss at lists.osgeo.org
> >>> http://lists.osgeo.org/mailman/listinfo/discuss
> >>>
> >> _______________________________________________
> >> Discuss mailing list
> >> Discuss at lists.osgeo.org
> >> http://lists.osgeo.org/mailman/listinfo/discuss
> >>
> >
> > have fun,
> >
> > SteveC | steve at asklater.com | http://www.asklater.com/steve/
> >
> >
> > _______________________________________________
> > Discuss mailing list
> > Discuss at lists.osgeo.org
> > http://lists.osgeo.org/mailman/listinfo/discuss
>
> --
> Allan Doyle
> Director of Technology
> MIT Museum
> +1.617.452.2111
>
>
>
>
> _______________________________________________
> Discuss mailing list
> Discuss at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss
>



-- 
************************************
David William Bitner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/discuss/attachments/20071113/9eab5594/attachment-0002.html>


More information about the Discuss mailing list