[OSGeo-Discuss] Re: idea for an OSGeo project -- a new, open data format

P Kishor punk.kish at gmail.com
Tue Nov 13 13:55:03 EST 2007


David,


On 11/13/07, David William Bitner <david.bitner at gmail.com> wrote:
> Part of the mission of the OSGeo Geodata committee
> (http://www.osgeo.org/geodata) is to "promote the use of open geospatial
> formats".  If there is a group that wants to continue pursuing the creation
> of a new open geodata format, I would like to encourage the use of the
> geodata mailing list. That being said, I think part of the discussion that
> needs to be had is whether or not OSGeo should be creating standards in the
> first place.
>
> A couple comments that I have on some of the discussion that has taken place
> in this thread:
>
> Regarding the suggestion that MapServer takes on this new format as the
> primary format:  I think this is way beyond the scope of what OSGeo should
> be doing.  Even if we spec a new standard, we (OSGeo) have no teeth to be
> able to make any of our projects do any kind of implementation of that
> standard.  The choice of formats that are used by any of our projects is
> driven by the needs of the users and developers and the resources (time,
> money) that have been dedicated towards implementing them.  If someone takes
> OpenShape or whatever and decides they have a business need that they can
> spend the time or money to get it implemented then it will be implemented.
> Shapefile has and will continue to be an important format for many projects
> as it is one of, if not the most distributed formats in the GIS world.
>

I respectfully disagree. I think OSGeo has plenty teeth for those who
want to believe in it. In the end, yes, just like any real project, it
needs a core of committed developer and plenty of time (or money --
usually they are synonymous). This is not something that can happen
overnight, but if good, it deserves a start and support. That the
long, long-term effects of a solid, relational, transactional, geodata
format would be very good is a reasonable assumption for me.

> Regarding the comments on standards wanking:  Standards can get in the way
> of progress along a straight line, but they can also encourage
> interoperability that can create better progress for everyone.  To get a
> singular task done, standards often can slow things down, but there *are*
> gains to be had from playing well with everyone else.

Here I totally agree. I am not sure how to interpret the "standards
wanking" statement. On the one hand it is a reasonably accurate
assessment of a lot of public hand-wringing and open alliances (for a
really funny take on this, read Fake Steve's tirade on the open
handset alliance at
<http://fakesteve.blogspot.com/2007/11/its-not-phone-its-alliance.html>).
But, on the other hand, it is a pretty damning judgment on any attempt
to do things via collaboration, and thus, on OSGeo and such efforts
itself.

My take is that if I can't do it alone, I will lay it out in the open
hoping someone better than me will work on it as well. If I can do it
alone, I will do it until I think it is ready to benefit from extra
eyeballs. Sometimes getting started is the biggest hurdle.


>
> David Bitner
> OSGeo, Public Geospatial Data Project Chair
>
> On Nov 13, 2007 11:40 AM, Allan Doyle <afdoyle at mit.edu> wrote:
> >
> >
> > On Nov 13, 2007, at 12:24 , Steve Coast wrote:
> >
> > > OSM: $0
> > > CCBYSA: $0
> > > Donation of entire Netherlands: Priceless
> > >
> > > Real artists ship. For everyone else there's standards wanking.
> >
> > Perhaps there's an art to wanking standards as well.
> >
> >
> >
> >
> > >
> > >
> > >
> > >
> > > Seriously though, this is so kafka-esque. When OSM started it was
> > > like this: We should have got a committee to design a standard, then
> > > we could think about a committee to design an ontology... and choose
> > > a name... and on some sunny distant day make a map.
> > >
> > >
> > >
> > > On 13 Nov 2007, at 17:09, P Kishor wrote:
> > >
> > >> On 11/13/07, Landon Blake <lblake at ksninc.com> wrote:
> > >>> Puneet,
> > >>>
> > >>> You wrote: "Should be easy to transition to. By building the new
> > >>> format
> > >>> on the
> > >>> structure of the Shapefile format, and *in fact*, calling it "open
> > >>> shapefiles" or some such thing, we indicate from its name that the
> > >>> transition is not that revolutionary but is evolutionary. This,
> > >>> hopefully, will bring some name-familiarity, and make the transition
> > >>> less scary."
> > >>>
> > >>> I really think you are going to run into problems using the
> > >>> "Shapefile"
> > >>> as part of the trademark or name for any product not sold by ESRI. I
> > >>> strongly recommend against this move. Let people adopt the
> > >>> implementation of your idea for its merits, not for name recognition
> > >>> that comes from another product line.
> > >>
> > >> Good enough point to keep in mind, but not to get hung up over enough
> > >> to entangle us. Suggestions for names of the data format can be a
> > >> project in itself. "open spatial data format" or its variations could
> > >> be chosen. Still, point taken.
> > >>
> > >>>
> > >>> You wrote: "ANSI standard C is still
> > >>> that magic common denominator that compiles and works predictably on
> > >>> most number of systems. I have a lot against Java, but those who
> > >>> love
> > >>> Java should definitely work on tools for accessing and working with
> > >>> this new format as it would only make the format more widely used
> > >>> and
> > >>> adopted."
> > >>>
> > >>> It sounds to me like you are really describing a tool. File
> > >>> formats are
> > >>> written in a binary encoding or text, not in a programming
> > >>> language. If
> > >>> you are designing a tool you can choose the programming language
> > >>> of your
> > >>> choice, but be aware that this will limit the developers that
> > >>> adopt the
> > >>> tool. This will be the case no matter what language you choose to
> > >>> use,
> > >>> whether it is C, Java, or something else.
> > >>>
> > >>> If, in contrast, you are creating a file format, then programming
> > >>> languages shouldn't really matter. Binary and text data can be
> > >>> accessed
> > >>> by almost all programming languages.
> > >>>
> > >>> I think you need to decide if you want a tool or a data format. It
> > >>> sounds like you are shooting more for a spatial database written
> > >>> in the
> > >>> C programming language that uses some form of the ESRI Shapefile
> > >>> as its
> > >>> underlying data storage mechanism. To me that is a tool or piece of
> > >>> software, not a format. But maybe I don't completely understand your
> > >>> goal.
> > >>>
> > >>
> > >> well, I am, frankly confused.
> > >>
> > >> I was quite convinced I wasn't describing a "tool" but was describing
> > >> a "format." Of course, to describe the format, I positioned it on the
> > >> "format" (the SQLite-compatible format) used and popularized by a
> > >> "tool" (SQLite, the library, which happens to be written in C). In my
> > >> mind, having the data format based on SQLite *format* for its
> > >> relational attribute handling was the real winner. In that sense,
> > >> perhaps I conflated the format and the tool. I am not well versed in
> > >> these things to I am probably already walking on thin ice, but that
> > >> shouldn't stop others.
> > >>
> > >> So, forget that I mentioned C and Java... let's just concentrate on a
> > >> way of laying out data on a disk that is not too dissimilar from how
> > >> Shapefile data are laid out, except that we utilize the
> > >> SQLite-compatible binary format for relational data handling, so that
> > >> SQLite-enabled spatial tools can access this new format.
> > >>
> > >> And, put this format into public domain.
> > >>
> > >>
> > >>>
> > >>> -----Original Message-----
> > >>> From: discuss-bounces at lists.osgeo.org
> > >>> [mailto:discuss-bounces at lists.osgeo.org] On Behalf Of P Kishor
> > >>> Sent: Tuesday, November 13, 2007 8:35 AM
> > >>> To: OSGeo Discussions
> > >>> Subject: [OSGeo-Discuss] Re: idea for an OSGeo project -- a new,open
> > >>> data format
> > >>>
> > >>> Thanks everyone, for responding. Here is my "groundwork."
> > >>>
> > >>> The new format --
> > >>>
> > >>> - Should be fast. SQLite is plenty fast, and anything that simply
> > >>> "extends" the Shapefile format to inject relational capabilities
> > >>> should be pretty fast. It should definitely be faster than a
> > >>> geodatabase format (such as PostGIS/ArcSDE) and perhaps even faster
> > >>> than Shapefiles especially while accessing attribute data. DBF is
> > >>> sequential, and searching for textual information is particularly
> > >>> expensive. SQLite has been tuned to excellence. I have been working
> > >>> with it for a few years now, and it really is an amazing product,
> > >>> development community, support, and capabilities. That it is in
> > >>> public
> > >>> domain makes for a transfat-free icing on the cake.
> > >>>
> > >>> - Should be unencumbered by licenses and copyrights. Ideally, the
> > >>> new
> > >>> format could also be put back into public domain. We want to remove
> > >>> all encumbrances to encourage rapid and wide adoption.
> > >>>
> > >>> - Should be a single file. Well, some like multiple files and some
> > >>> like single files. We can achieve both objectives by using a
> > >>> tar-gzipped packaging such as Apple tends to use for much of its
> > >>> stuff
> > >>> (for example, its Pages wordprocessor uses a tgzipped xml file along
> > >>> with other resources for icons and pictures and stuff). Or, if speed
> > >>> is going to be affected because of gzipping and gunzipping, just a
> > >>> package format (I have no idea if this is a Unix thing or a Mac OS
> > >>> thing -- we, in the Mac world, call them packages... they appear
> > >>> like
> > >>> files in the Finder, and like directories in the shell).
> > >>>
> > >>> - Should be easy to transition to. By building the new format on the
> > >>> structure of the Shapefile format, and *in fact*, calling it "open
> > >>> shapefiles" or some such thing, we indicate from its name that the
> > >>> transition is not that revolutionary but is evolutionary. This,
> > >>> hopefully, will bring some name-familiarity, and make the transition
> > >>> less scary.
> > >>>
> > >>> - Frank mentions SQLite's lack of datatypes as an issue -- I guess
> > >>> that is a matter of preference. I personally quite like that freedom
> > >>> as it gives me, the application developer, complete control over
> > >>> what
> > >>> goes where. SQLite actually does have now a few datatypes that it
> > >>> respects, but doesn't complain about. Since all users will be
> > >>> accessing the data via an application, as long as the application is
> > >>> well defined, it should be fine.
> > >>>
> > >>> - SQLite excels at one thing that it has been entrusted to do --
> > >>> retrieve data that it has been entrusted with at extremely fast
> > >>> speeds, and maintain ACID data integrity in case of a programmatic
> > >>> catastrophe. The transactions themselves are worth their price of
> > >>> admission, which, happily, happens to be zero.
> > >>>
> > >>> - Langdon mentions Java support -- well, yes, use/work on SQLite
> > >>> JDBC.
> > >>> I have been using it for a few days now and find it to be a pretty
> > >>> competent conduit. Extend it, spatialize it. ANSI standard C is
> > >>> still
> > >>> that magic common denominator that compiles and works predictably on
> > >>> most number of systems. I have a lot against Java, but those who
> > >>> love
> > >>> Java should definitely work on tools for accessing and working with
> > >>> this new format as it would only make the format more widely used
> > >>> and
> > >>> adopted.
> > >>>
> > >>> Ok, enough for now.
> > >>>
> > >>>
> > >>>
> > >>> On Nov 13, 2007 8:52 AM, P Kishor <punk.kish at gmail.com> wrote:
> > >>>> So, I am thinking, Shapefile is the de facto data standard for GIS
> > >>>> data. That it is open (albeit not Free) along with the deep and
> > >>>> wide
> > >>>> presence of ESRI's products from the beginning of the epoch, it has
> > >>>> been widely adopted. Existence of shapelib, various language
> > >>>> bindings,
> > >>>> and ready use by products such as MapServer has continued to cement
> > >>>> Shapefile as the format to use. All this is in spite of Shapefile's
> > >>>> inherent drawbacks, particularly in the area of attribute data
> > >>>> management.
> > >>>>
> > >>>> What if we came up with a new and improved data format -- call it
> > >>>> "Open Shapefile" (extension .osh) -- that would be completely Free,
> > >>>> single-file based (instead of the multiple .shp, .dbf, .shx, etc.),
> > >>>> and based on SQLite, giving the .osh format complete relational
> > >>>> data
> > >>>> handling capabilities. We would require a new version of Shapelib,
> > >>>> improved language bindings, make it the default and preferred
> > >>>> format
> > >>>> for MapServer, and provide seamless and painless import of regular
> > >>>> .shp data into .osh for native rendering. Its adoption would be
> > >>>> quick
> > >>>> in the open source community. The non-opensource community would
> > >>>> either not give a rat's behind for it, but it wouldn't affect
> > >>>> them...
> > >>>> they would still work with their preferred .shp until they learned
> > >>>> better. By having a completely open and Free single-file based,
> > >>>> built
> > >>>> on SQLite, fully relational dbms capable spatial data format, it
> > >>>> would
> > >>>> be positioned for continued improvement and development.
> > >>>>
> > >>>> Is this too crazy?
> > >>>>
> > >>>> --
> > >>>> Puneet Kishor
> > >>>>
> > >>> _______________________________________________
> > >>> Discuss mailing list
> > >>> Discuss at lists.osgeo.org
> > >>> http://lists.osgeo.org/mailman/listinfo/discuss
>
> > >>>
> > >>>
> > >>> Warning:
> > >>> Information provided via electronic media is not guaranteed
> > >>> against defects including translation and transmission errors. If
> > >>> the reader is not the intended recipient, you are hereby notified
> > >>> that any dissemination, distribution or copying of this
> > >>> communication is strictly prohibited. If you have received this
> > >>> information in error, please notify the sender immediately.
> > >>> _______________________________________________
> > >>> Discuss mailing list
> > >>> Discuss at lists.osgeo.org
> > >>>
> http://lists.osgeo.org/mailman/listinfo/discuss
> > >>>
> > >> _______________________________________________
> > >> Discuss mailing list
> > >> Discuss at lists.osgeo.org
> > >> http://lists.osgeo.org/mailman/listinfo/discuss
>
> > >>
> > >
> > > have fun,
> > >
> > > SteveC | steve at asklater.com | http://www.asklater.com/steve/
> > >
> > >
> > > _______________________________________________
> > > Discuss mailing list
> > > Discuss at lists.osgeo.org
> > >
> http://lists.osgeo.org/mailman/listinfo/discuss
> >
> > --
> > Allan Doyle
> > Director of Technology
> > MIT Museum
> > +1.617.452.2111
> >
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > Discuss mailing list
> > Discuss at lists.osgeo.org
> >
> http://lists.osgeo.org/mailman/listinfo/discuss
> >
>
>
>
> --
> ************************************
> David William Bitner


More information about the Discuss mailing list