[OSGeo-Discuss] Re: idea for an OSGeo project -- a new, open data format

David William Bitner david.bitner at gmail.com
Tue Nov 13 11:08:05 PST 2007


I have created a (now empty) space on the OSGeo wiki to start to fill in
concrete details that come out of this discussion at
http://wiki.osgeo.org/index.php/Geodata_formats.  Please use the wiki to put
your wishlists for a new open data format, lists of existing data formats
with links to their specifications etc in the wiki.  Please join the Geodata
Mailing list (http://www.osgeo.org/geodata) and continue this thread with
debate and discussion relating to a new format on that list as I believe it
is a more appropriate venue.

David

On Nov 13, 2007 12:55 PM, P Kishor <punk.kish at gmail.com> wrote:

> David,
>
>
> On 11/13/07, David William Bitner <david.bitner at gmail.com> wrote:
> > Part of the mission of the OSGeo Geodata committee
> > (http://www.osgeo.org/geodata) is to "promote the use of open geospatial
> > formats".  If there is a group that wants to continue pursuing the
> creation
> > of a new open geodata format, I would like to encourage the use of the
> > geodata mailing list. That being said, I think part of the discussion
> that
> > needs to be had is whether or not OSGeo should be creating standards in
> the
> > first place.
> >
> > A couple comments that I have on some of the discussion that has taken
> place
> > in this thread:
> >
> > Regarding the suggestion that MapServer takes on this new format as the
> > primary format:  I think this is way beyond the scope of what OSGeo
> should
> > be doing.  Even if we spec a new standard, we (OSGeo) have no teeth to
> be
> > able to make any of our projects do any kind of implementation of that
> > standard.  The choice of formats that are used by any of our projects is
> > driven by the needs of the users and developers and the resources (time,
> > money) that have been dedicated towards implementing them.  If someone
> takes
> > OpenShape or whatever and decides they have a business need that they
> can
> > spend the time or money to get it implemented then it will be
> implemented.
> > Shapefile has and will continue to be an important format for many
> projects
> > as it is one of, if not the most distributed formats in the GIS world.
> >
>
> I respectfully disagree. I think OSGeo has plenty teeth for those who
> want to believe in it. In the end, yes, just like any real project, it
> needs a core of committed developer and plenty of time (or money --
> usually they are synonymous). This is not something that can happen
> overnight, but if good, it deserves a start and support. That the
> long, long-term effects of a solid, relational, transactional, geodata
> format would be very good is a reasonable assumption for me.
>
> > Regarding the comments on standards wanking:  Standards can get in the
> way
> > of progress along a straight line, but they can also encourage
> > interoperability that can create better progress for everyone.  To get a
> > singular task done, standards often can slow things down, but there
> *are*
> > gains to be had from playing well with everyone else.
>
> Here I totally agree. I am not sure how to interpret the "standards
> wanking" statement. On the one hand it is a reasonably accurate
> assessment of a lot of public hand-wringing and open alliances (for a
> really funny take on this, read Fake Steve's tirade on the open
> handset alliance at
> <http://fakesteve.blogspot.com/2007/11/its-not-phone-its-alliance.html>).
> But, on the other hand, it is a pretty damning judgment on any attempt
> to do things via collaboration, and thus, on OSGeo and such efforts
> itself.
>
> My take is that if I can't do it alone, I will lay it out in the open
> hoping someone better than me will work on it as well. If I can do it
> alone, I will do it until I think it is ready to benefit from extra
> eyeballs. Sometimes getting started is the biggest hurdle.
>
>
> >
> > David Bitner
> > OSGeo, Public Geospatial Data Project Chair
> >
> > On Nov 13, 2007 11:40 AM, Allan Doyle <afdoyle at mit.edu> wrote:
> > >
> > >
> > > On Nov 13, 2007, at 12:24 , Steve Coast wrote:
> > >
> > > > OSM: $0
> > > > CCBYSA: $0
> > > > Donation of entire Netherlands: Priceless
> > > >
> > > > Real artists ship. For everyone else there's standards wanking.
> > >
> > > Perhaps there's an art to wanking standards as well.
> > >
> > >
> > >
> > >
> > > >
> > > >
> > > >
> > > >
> > > > Seriously though, this is so kafka-esque. When OSM started it was
> > > > like this: We should have got a committee to design a standard, then
> > > > we could think about a committee to design an ontology... and choose
> > > > a name... and on some sunny distant day make a map.
> > > >
> > > >
> > > >
> > > > On 13 Nov 2007, at 17:09, P Kishor wrote:
> > > >
> > > >> On 11/13/07, Landon Blake <lblake at ksninc.com> wrote:
> > > >>> Puneet,
> > > >>>
> > > >>> You wrote: "Should be easy to transition to. By building the new
> > > >>> format
> > > >>> on the
> > > >>> structure of the Shapefile format, and *in fact*, calling it "open
> > > >>> shapefiles" or some such thing, we indicate from its name that the
> > > >>> transition is not that revolutionary but is evolutionary. This,
> > > >>> hopefully, will bring some name-familiarity, and make the
> transition
> > > >>> less scary."
> > > >>>
> > > >>> I really think you are going to run into problems using the
> > > >>> "Shapefile"
> > > >>> as part of the trademark or name for any product not sold by ESRI.
> I
> > > >>> strongly recommend against this move. Let people adopt the
> > > >>> implementation of your idea for its merits, not for name
> recognition
> > > >>> that comes from another product line.
> > > >>
> > > >> Good enough point to keep in mind, but not to get hung up over
> enough
> > > >> to entangle us. Suggestions for names of the data format can be a
> > > >> project in itself. "open spatial data format" or its variations
> could
> > > >> be chosen. Still, point taken.
> > > >>
> > > >>>
> > > >>> You wrote: "ANSI standard C is still
> > > >>> that magic common denominator that compiles and works predictably
> on
> > > >>> most number of systems. I have a lot against Java, but those who
> > > >>> love
> > > >>> Java should definitely work on tools for accessing and working
> with
> > > >>> this new format as it would only make the format more widely used
> > > >>> and
> > > >>> adopted."
> > > >>>
> > > >>> It sounds to me like you are really describing a tool. File
> > > >>> formats are
> > > >>> written in a binary encoding or text, not in a programming
> > > >>> language. If
> > > >>> you are designing a tool you can choose the programming language
> > > >>> of your
> > > >>> choice, but be aware that this will limit the developers that
> > > >>> adopt the
> > > >>> tool. This will be the case no matter what language you choose to
> > > >>> use,
> > > >>> whether it is C, Java, or something else.
> > > >>>
> > > >>> If, in contrast, you are creating a file format, then programming
> > > >>> languages shouldn't really matter. Binary and text data can be
> > > >>> accessed
> > > >>> by almost all programming languages.
> > > >>>
> > > >>> I think you need to decide if you want a tool or a data format. It
> > > >>> sounds like you are shooting more for a spatial database written
> > > >>> in the
> > > >>> C programming language that uses some form of the ESRI Shapefile
> > > >>> as its
> > > >>> underlying data storage mechanism. To me that is a tool or piece
> of
> > > >>> software, not a format. But maybe I don't completely understand
> your
> > > >>> goal.
> > > >>>
> > > >>
> > > >> well, I am, frankly confused.
> > > >>
> > > >> I was quite convinced I wasn't describing a "tool" but was
> describing
> > > >> a "format." Of course, to describe the format, I positioned it on
> the
> > > >> "format" (the SQLite-compatible format) used and popularized by a
> > > >> "tool" (SQLite, the library, which happens to be written in C). In
> my
> > > >> mind, having the data format based on SQLite *format* for its
> > > >> relational attribute handling was the real winner. In that sense,
> > > >> perhaps I conflated the format and the tool. I am not well versed
> in
> > > >> these things to I am probably already walking on thin ice, but that
> > > >> shouldn't stop others.
> > > >>
> > > >> So, forget that I mentioned C and Java... let's just concentrate on
> a
> > > >> way of laying out data on a disk that is not too dissimilar from
> how
> > > >> Shapefile data are laid out, except that we utilize the
> > > >> SQLite-compatible binary format for relational data handling, so
> that
> > > >> SQLite-enabled spatial tools can access this new format.
> > > >>
> > > >> And, put this format into public domain.
> > > >>
> > > >>
> > > >>>
> > > >>> -----Original Message-----
> > > >>> From: discuss-bounces at lists.osgeo.org
> > > >>> [mailto:discuss-bounces at lists.osgeo.org] On Behalf Of P Kishor
> > > >>> Sent: Tuesday, November 13, 2007 8:35 AM
> > > >>> To: OSGeo Discussions
> > > >>> Subject: [OSGeo-Discuss] Re: idea for an OSGeo project -- a
> new,open
> > > >>> data format
> > > >>>
> > > >>> Thanks everyone, for responding. Here is my "groundwork."
> > > >>>
> > > >>> The new format --
> > > >>>
> > > >>> - Should be fast. SQLite is plenty fast, and anything that simply
> > > >>> "extends" the Shapefile format to inject relational capabilities
> > > >>> should be pretty fast. It should definitely be faster than a
> > > >>> geodatabase format (such as PostGIS/ArcSDE) and perhaps even
> faster
> > > >>> than Shapefiles especially while accessing attribute data. DBF is
> > > >>> sequential, and searching for textual information is particularly
> > > >>> expensive. SQLite has been tuned to excellence. I have been
> working
> > > >>> with it for a few years now, and it really is an amazing product,
> > > >>> development community, support, and capabilities. That it is in
> > > >>> public
> > > >>> domain makes for a transfat-free icing on the cake.
> > > >>>
> > > >>> - Should be unencumbered by licenses and copyrights. Ideally, the
> > > >>> new
> > > >>> format could also be put back into public domain. We want to
> remove
> > > >>> all encumbrances to encourage rapid and wide adoption.
> > > >>>
> > > >>> - Should be a single file. Well, some like multiple files and some
> > > >>> like single files. We can achieve both objectives by using a
> > > >>> tar-gzipped packaging such as Apple tends to use for much of its
> > > >>> stuff
> > > >>> (for example, its Pages wordprocessor uses a tgzipped xml file
> along
> > > >>> with other resources for icons and pictures and stuff). Or, if
> speed
> > > >>> is going to be affected because of gzipping and gunzipping, just a
> > > >>> package format (I have no idea if this is a Unix thing or a Mac OS
> > > >>> thing -- we, in the Mac world, call them packages... they appear
> > > >>> like
> > > >>> files in the Finder, and like directories in the shell).
> > > >>>
> > > >>> - Should be easy to transition to. By building the new format on
> the
> > > >>> structure of the Shapefile format, and *in fact*, calling it "open
> > > >>> shapefiles" or some such thing, we indicate from its name that the
> > > >>> transition is not that revolutionary but is evolutionary. This,
> > > >>> hopefully, will bring some name-familiarity, and make the
> transition
> > > >>> less scary.
> > > >>>
> > > >>> - Frank mentions SQLite's lack of datatypes as an issue -- I guess
> > > >>> that is a matter of preference. I personally quite like that
> freedom
> > > >>> as it gives me, the application developer, complete control over
> > > >>> what
> > > >>> goes where. SQLite actually does have now a few datatypes that it
> > > >>> respects, but doesn't complain about. Since all users will be
> > > >>> accessing the data via an application, as long as the application
> is
> > > >>> well defined, it should be fine.
> > > >>>
> > > >>> - SQLite excels at one thing that it has been entrusted to do --
> > > >>> retrieve data that it has been entrusted with at extremely fast
> > > >>> speeds, and maintain ACID data integrity in case of a programmatic
> > > >>> catastrophe. The transactions themselves are worth their price of
> > > >>> admission, which, happily, happens to be zero.
> > > >>>
> > > >>> - Langdon mentions Java support -- well, yes, use/work on SQLite
> > > >>> JDBC.
> > > >>> I have been using it for a few days now and find it to be a pretty
> > > >>> competent conduit. Extend it, spatialize it. ANSI standard C is
> > > >>> still
> > > >>> that magic common denominator that compiles and works predictably
> on
> > > >>> most number of systems. I have a lot against Java, but those who
> > > >>> love
> > > >>> Java should definitely work on tools for accessing and working
> with
> > > >>> this new format as it would only make the format more widely used
> > > >>> and
> > > >>> adopted.
> > > >>>
> > > >>> Ok, enough for now.
> > > >>>
> > > >>>
> > > >>>
> > > >>> On Nov 13, 2007 8:52 AM, P Kishor <punk.kish at gmail.com> wrote:
> > > >>>> So, I am thinking, Shapefile is the de facto data standard for
> GIS
> > > >>>> data. That it is open (albeit not Free) along with the deep and
> > > >>>> wide
> > > >>>> presence of ESRI's products from the beginning of the epoch, it
> has
> > > >>>> been widely adopted. Existence of shapelib, various language
> > > >>>> bindings,
> > > >>>> and ready use by products such as MapServer has continued to
> cement
> > > >>>> Shapefile as the format to use. All this is in spite of
> Shapefile's
> > > >>>> inherent drawbacks, particularly in the area of attribute data
> > > >>>> management.
> > > >>>>
> > > >>>> What if we came up with a new and improved data format -- call it
> > > >>>> "Open Shapefile" (extension .osh) -- that would be completely
> Free,
> > > >>>> single-file based (instead of the multiple .shp, .dbf, .shx,
> etc.),
> > > >>>> and based on SQLite, giving the .osh format complete relational
> > > >>>> data
> > > >>>> handling capabilities. We would require a new version of
> Shapelib,
> > > >>>> improved language bindings, make it the default and preferred
> > > >>>> format
> > > >>>> for MapServer, and provide seamless and painless import of
> regular
> > > >>>> .shp data into .osh for native rendering. Its adoption would be
> > > >>>> quick
> > > >>>> in the open source community. The non-opensource community would
> > > >>>> either not give a rat's behind for it, but it wouldn't affect
> > > >>>> them...
> > > >>>> they would still work with their preferred .shp until they
> learned
> > > >>>> better. By having a completely open and Free single-file based,
> > > >>>> built
> > > >>>> on SQLite, fully relational dbms capable spatial data format, it
> > > >>>> would
> > > >>>> be positioned for continued improvement and development.
> > > >>>>
> > > >>>> Is this too crazy?
> > > >>>>
> > > >>>> --
> > > >>>> Puneet Kishor
> > > >>>>
> > > >>> _______________________________________________
> > > >>> Discuss mailing list
> > > >>> Discuss at lists.osgeo.org
> > > >>> http://lists.osgeo.org/mailman/listinfo/discuss
> >
> > > >>>
> > > >>>
> > > >>> Warning:
> > > >>> Information provided via electronic media is not guaranteed
> > > >>> against defects including translation and transmission errors. If
> > > >>> the reader is not the intended recipient, you are hereby notified
> > > >>> that any dissemination, distribution or copying of this
> > > >>> communication is strictly prohibited. If you have received this
> > > >>> information in error, please notify the sender immediately.
> > > >>> _______________________________________________
> > > >>> Discuss mailing list
> > > >>> Discuss at lists.osgeo.org
> > > >>>
> > http://lists.osgeo.org/mailman/listinfo/discuss
> > > >>>
> > > >> _______________________________________________
> > > >> Discuss mailing list
> > > >> Discuss at lists.osgeo.org
> > > >> http://lists.osgeo.org/mailman/listinfo/discuss
> >
> > > >>
> > > >
> > > > have fun,
> > > >
> > > > SteveC | steve at asklater.com | http://www.asklater.com/steve/
> > > >
> > > >
> > > > _______________________________________________
> > > > Discuss mailing list
> > > > Discuss at lists.osgeo.org
> > > >
> > http://lists.osgeo.org/mailman/listinfo/discuss
> > >
> > > --
> > > Allan Doyle
> > > Director of Technology
> > > MIT Museum
> > > +1.617.452.2111
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > Discuss mailing list
> > > Discuss at lists.osgeo.org
> > >
> > http://lists.osgeo.org/mailman/listinfo/discuss
> > >
> >
> >
> >
> > --
> > ************************************
> > David William Bitner
>



-- 
************************************
David William Bitner
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/discuss/attachments/20071113/1af27901/attachment-0002.html>


More information about the Discuss mailing list