[OSGeo-Discuss] Re: idea for an OSGeo project -- a new, open data format

Steve Coast steve at asklater.com
Tue Nov 13 09:24:55 PST 2007


OSM: $0
CCBYSA: $0
Donation of entire Netherlands: Priceless

Real artists ship. For everyone else there's standards wanking.



Seriously though, this is so kafka-esque. When OSM started it was like  
this: We should have got a committee to design a standard, then we  
could think about a committee to design an ontology... and choose a  
name... and on some sunny distant day make a map.


On 13 Nov 2007, at 17:09, P Kishor wrote:

> On 11/13/07, Landon Blake <lblake at ksninc.com> wrote:
>> Puneet,
>>
>> You wrote: "Should be easy to transition to. By building the new  
>> format
>> on the
>> structure of the Shapefile format, and *in fact*, calling it "open
>> shapefiles" or some such thing, we indicate from its name that the
>> transition is not that revolutionary but is evolutionary. This,
>> hopefully, will bring some name-familiarity, and make the transition
>> less scary."
>>
>> I really think you are going to run into problems using the  
>> "Shapefile"
>> as part of the trademark or name for any product not sold by ESRI. I
>> strongly recommend against this move. Let people adopt the
>> implementation of your idea for its merits, not for name recognition
>> that comes from another product line.
>
> Good enough point to keep in mind, but not to get hung up over enough
> to entangle us. Suggestions for names of the data format can be a
> project in itself. "open spatial data format" or its variations could
> be chosen. Still, point taken.
>
>>
>> You wrote: "ANSI standard C is still
>> that magic common denominator that compiles and works predictably on
>> most number of systems. I have a lot against Java, but those who love
>> Java should definitely work on tools for accessing and working with
>> this new format as it would only make the format more widely used and
>> adopted."
>>
>> It sounds to me like you are really describing a tool. File formats  
>> are
>> written in a binary encoding or text, not in a programming  
>> language. If
>> you are designing a tool you can choose the programming language of  
>> your
>> choice, but be aware that this will limit the developers that adopt  
>> the
>> tool. This will be the case no matter what language you choose to  
>> use,
>> whether it is C, Java, or something else.
>>
>> If, in contrast, you are creating a file format, then programming
>> languages shouldn't really matter. Binary and text data can be  
>> accessed
>> by almost all programming languages.
>>
>> I think you need to decide if you want a tool or a data format. It
>> sounds like you are shooting more for a spatial database written in  
>> the
>> C programming language that uses some form of the ESRI Shapefile as  
>> its
>> underlying data storage mechanism. To me that is a tool or piece of
>> software, not a format. But maybe I don't completely understand your
>> goal.
>>
>
> well, I am, frankly confused.
>
> I was quite convinced I wasn't describing a "tool" but was describing
> a "format." Of course, to describe the format, I positioned it on the
> "format" (the SQLite-compatible format) used and popularized by a
> "tool" (SQLite, the library, which happens to be written in C). In my
> mind, having the data format based on SQLite *format* for its
> relational attribute handling was the real winner. In that sense,
> perhaps I conflated the format and the tool. I am not well versed in
> these things to I am probably already walking on thin ice, but that
> shouldn't stop others.
>
> So, forget that I mentioned C and Java... let's just concentrate on a
> way of laying out data on a disk that is not too dissimilar from how
> Shapefile data are laid out, except that we utilize the
> SQLite-compatible binary format for relational data handling, so that
> SQLite-enabled spatial tools can access this new format.
>
> And, put this format into public domain.
>
>
>>
>> -----Original Message-----
>> From: discuss-bounces at lists.osgeo.org
>> [mailto:discuss-bounces at lists.osgeo.org] On Behalf Of P Kishor
>> Sent: Tuesday, November 13, 2007 8:35 AM
>> To: OSGeo Discussions
>> Subject: [OSGeo-Discuss] Re: idea for an OSGeo project -- a new,open
>> data format
>>
>> Thanks everyone, for responding. Here is my "groundwork."
>>
>> The new format --
>>
>> - Should be fast. SQLite is plenty fast, and anything that simply
>> "extends" the Shapefile format to inject relational capabilities
>> should be pretty fast. It should definitely be faster than a
>> geodatabase format (such as PostGIS/ArcSDE) and perhaps even faster
>> than Shapefiles especially while accessing attribute data. DBF is
>> sequential, and searching for textual information is particularly
>> expensive. SQLite has been tuned to excellence. I have been working
>> with it for a few years now, and it really is an amazing product,
>> development community, support, and capabilities. That it is in  
>> public
>> domain makes for a transfat-free icing on the cake.
>>
>> - Should be unencumbered by licenses and copyrights. Ideally, the new
>> format could also be put back into public domain. We want to remove
>> all encumbrances to encourage rapid and wide adoption.
>>
>> - Should be a single file. Well, some like multiple files and some
>> like single files. We can achieve both objectives by using a
>> tar-gzipped packaging such as Apple tends to use for much of its  
>> stuff
>> (for example, its Pages wordprocessor uses a tgzipped xml file along
>> with other resources for icons and pictures and stuff). Or, if speed
>> is going to be affected because of gzipping and gunzipping, just a
>> package format (I have no idea if this is a Unix thing or a Mac OS
>> thing -- we, in the Mac world, call them packages... they appear like
>> files in the Finder, and like directories in the shell).
>>
>> - Should be easy to transition to. By building the new format on the
>> structure of the Shapefile format, and *in fact*, calling it "open
>> shapefiles" or some such thing, we indicate from its name that the
>> transition is not that revolutionary but is evolutionary. This,
>> hopefully, will bring some name-familiarity, and make the transition
>> less scary.
>>
>> - Frank mentions SQLite's lack of datatypes as an issue -- I guess
>> that is a matter of preference. I personally quite like that freedom
>> as it gives me, the application developer, complete control over what
>> goes where. SQLite actually does have now a few datatypes that it
>> respects, but doesn't complain about. Since all users will be
>> accessing the data via an application, as long as the application is
>> well defined, it should be fine.
>>
>> - SQLite excels at one thing that it has been entrusted to do --
>> retrieve data that it has been entrusted with at extremely fast
>> speeds, and maintain ACID data integrity in case of a programmatic
>> catastrophe. The transactions themselves are worth their price of
>> admission, which, happily, happens to be zero.
>>
>> - Langdon mentions Java support -- well, yes, use/work on SQLite  
>> JDBC.
>> I have been using it for a few days now and find it to be a pretty
>> competent conduit. Extend it, spatialize it. ANSI standard C is still
>> that magic common denominator that compiles and works predictably on
>> most number of systems. I have a lot against Java, but those who love
>> Java should definitely work on tools for accessing and working with
>> this new format as it would only make the format more widely used and
>> adopted.
>>
>> Ok, enough for now.
>>
>>
>>
>> On Nov 13, 2007 8:52 AM, P Kishor <punk.kish at gmail.com> wrote:
>>> So, I am thinking, Shapefile is the de facto data standard for GIS
>>> data. That it is open (albeit not Free) along with the deep and wide
>>> presence of ESRI's products from the beginning of the epoch, it has
>>> been widely adopted. Existence of shapelib, various language  
>>> bindings,
>>> and ready use by products such as MapServer has continued to cement
>>> Shapefile as the format to use. All this is in spite of Shapefile's
>>> inherent drawbacks, particularly in the area of attribute data
>>> management.
>>>
>>> What if we came up with a new and improved data format -- call it
>>> "Open Shapefile" (extension .osh) -- that would be completely Free,
>>> single-file based (instead of the multiple .shp, .dbf, .shx, etc.),
>>> and based on SQLite, giving the .osh format complete relational data
>>> handling capabilities. We would require a new version of Shapelib,
>>> improved language bindings, make it the default and preferred format
>>> for MapServer, and provide seamless and painless import of regular
>>> .shp data into .osh for native rendering. Its adoption would be  
>>> quick
>>> in the open source community. The non-opensource community would
>>> either not give a rat's behind for it, but it wouldn't affect  
>>> them...
>>> they would still work with their preferred .shp until they learned
>>> better. By having a completely open and Free single-file based,  
>>> built
>>> on SQLite, fully relational dbms capable spatial data format, it  
>>> would
>>> be positioned for continued improvement and development.
>>>
>>> Is this too crazy?
>>>
>>> --
>>> Puneet Kishor
>>>
>> _______________________________________________
>> Discuss mailing list
>> Discuss at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/discuss
>>
>>
>> Warning:
>> Information provided via electronic media is not guaranteed against  
>> defects including translation and transmission errors. If the  
>> reader is not the intended recipient, you are hereby notified that  
>> any dissemination, distribution or copying of this communication is  
>> strictly prohibited. If you have received this information in  
>> error, please notify the sender immediately.
>> _______________________________________________
>> Discuss mailing list
>> Discuss at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/discuss
>>
> _______________________________________________
> Discuss mailing list
> Discuss at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss
>

have fun,

SteveC | steve at asklater.com | http://www.asklater.com/steve/





More information about the Discuss mailing list