[OSGeo-Discuss] Re: idea for an OSGeo project -- a new, open data format

Allan Doyle afdoyle at MIT.EDU
Tue Nov 13 09:40:02 PST 2007


On Nov 13, 2007, at 12:24 , Steve Coast wrote:

> OSM: $0
> CCBYSA: $0
> Donation of entire Netherlands: Priceless
>
> Real artists ship. For everyone else there's standards wanking.

Perhaps there's an art to wanking standards as well.

>
>
>
>
> Seriously though, this is so kafka-esque. When OSM started it was  
> like this: We should have got a committee to design a standard, then  
> we could think about a committee to design an ontology... and choose  
> a name... and on some sunny distant day make a map.
>
>
>
> On 13 Nov 2007, at 17:09, P Kishor wrote:
>
>> On 11/13/07, Landon Blake <lblake at ksninc.com> wrote:
>>> Puneet,
>>>
>>> You wrote: "Should be easy to transition to. By building the new  
>>> format
>>> on the
>>> structure of the Shapefile format, and *in fact*, calling it "open
>>> shapefiles" or some such thing, we indicate from its name that the
>>> transition is not that revolutionary but is evolutionary. This,
>>> hopefully, will bring some name-familiarity, and make the transition
>>> less scary."
>>>
>>> I really think you are going to run into problems using the  
>>> "Shapefile"
>>> as part of the trademark or name for any product not sold by ESRI. I
>>> strongly recommend against this move. Let people adopt the
>>> implementation of your idea for its merits, not for name recognition
>>> that comes from another product line.
>>
>> Good enough point to keep in mind, but not to get hung up over enough
>> to entangle us. Suggestions for names of the data format can be a
>> project in itself. "open spatial data format" or its variations could
>> be chosen. Still, point taken.
>>
>>>
>>> You wrote: "ANSI standard C is still
>>> that magic common denominator that compiles and works predictably on
>>> most number of systems. I have a lot against Java, but those who  
>>> love
>>> Java should definitely work on tools for accessing and working with
>>> this new format as it would only make the format more widely used  
>>> and
>>> adopted."
>>>
>>> It sounds to me like you are really describing a tool. File  
>>> formats are
>>> written in a binary encoding or text, not in a programming  
>>> language. If
>>> you are designing a tool you can choose the programming language  
>>> of your
>>> choice, but be aware that this will limit the developers that  
>>> adopt the
>>> tool. This will be the case no matter what language you choose to  
>>> use,
>>> whether it is C, Java, or something else.
>>>
>>> If, in contrast, you are creating a file format, then programming
>>> languages shouldn't really matter. Binary and text data can be  
>>> accessed
>>> by almost all programming languages.
>>>
>>> I think you need to decide if you want a tool or a data format. It
>>> sounds like you are shooting more for a spatial database written  
>>> in the
>>> C programming language that uses some form of the ESRI Shapefile  
>>> as its
>>> underlying data storage mechanism. To me that is a tool or piece of
>>> software, not a format. But maybe I don't completely understand your
>>> goal.
>>>
>>
>> well, I am, frankly confused.
>>
>> I was quite convinced I wasn't describing a "tool" but was describing
>> a "format." Of course, to describe the format, I positioned it on the
>> "format" (the SQLite-compatible format) used and popularized by a
>> "tool" (SQLite, the library, which happens to be written in C). In my
>> mind, having the data format based on SQLite *format* for its
>> relational attribute handling was the real winner. In that sense,
>> perhaps I conflated the format and the tool. I am not well versed in
>> these things to I am probably already walking on thin ice, but that
>> shouldn't stop others.
>>
>> So, forget that I mentioned C and Java... let's just concentrate on a
>> way of laying out data on a disk that is not too dissimilar from how
>> Shapefile data are laid out, except that we utilize the
>> SQLite-compatible binary format for relational data handling, so that
>> SQLite-enabled spatial tools can access this new format.
>>
>> And, put this format into public domain.
>>
>>
>>>
>>> -----Original Message-----
>>> From: discuss-bounces at lists.osgeo.org
>>> [mailto:discuss-bounces at lists.osgeo.org] On Behalf Of P Kishor
>>> Sent: Tuesday, November 13, 2007 8:35 AM
>>> To: OSGeo Discussions
>>> Subject: [OSGeo-Discuss] Re: idea for an OSGeo project -- a new,open
>>> data format
>>>
>>> Thanks everyone, for responding. Here is my "groundwork."
>>>
>>> The new format --
>>>
>>> - Should be fast. SQLite is plenty fast, and anything that simply
>>> "extends" the Shapefile format to inject relational capabilities
>>> should be pretty fast. It should definitely be faster than a
>>> geodatabase format (such as PostGIS/ArcSDE) and perhaps even faster
>>> than Shapefiles especially while accessing attribute data. DBF is
>>> sequential, and searching for textual information is particularly
>>> expensive. SQLite has been tuned to excellence. I have been working
>>> with it for a few years now, and it really is an amazing product,
>>> development community, support, and capabilities. That it is in  
>>> public
>>> domain makes for a transfat-free icing on the cake.
>>>
>>> - Should be unencumbered by licenses and copyrights. Ideally, the  
>>> new
>>> format could also be put back into public domain. We want to remove
>>> all encumbrances to encourage rapid and wide adoption.
>>>
>>> - Should be a single file. Well, some like multiple files and some
>>> like single files. We can achieve both objectives by using a
>>> tar-gzipped packaging such as Apple tends to use for much of its  
>>> stuff
>>> (for example, its Pages wordprocessor uses a tgzipped xml file along
>>> with other resources for icons and pictures and stuff). Or, if speed
>>> is going to be affected because of gzipping and gunzipping, just a
>>> package format (I have no idea if this is a Unix thing or a Mac OS
>>> thing -- we, in the Mac world, call them packages... they appear  
>>> like
>>> files in the Finder, and like directories in the shell).
>>>
>>> - Should be easy to transition to. By building the new format on the
>>> structure of the Shapefile format, and *in fact*, calling it "open
>>> shapefiles" or some such thing, we indicate from its name that the
>>> transition is not that revolutionary but is evolutionary. This,
>>> hopefully, will bring some name-familiarity, and make the transition
>>> less scary.
>>>
>>> - Frank mentions SQLite's lack of datatypes as an issue -- I guess
>>> that is a matter of preference. I personally quite like that freedom
>>> as it gives me, the application developer, complete control over  
>>> what
>>> goes where. SQLite actually does have now a few datatypes that it
>>> respects, but doesn't complain about. Since all users will be
>>> accessing the data via an application, as long as the application is
>>> well defined, it should be fine.
>>>
>>> - SQLite excels at one thing that it has been entrusted to do --
>>> retrieve data that it has been entrusted with at extremely fast
>>> speeds, and maintain ACID data integrity in case of a programmatic
>>> catastrophe. The transactions themselves are worth their price of
>>> admission, which, happily, happens to be zero.
>>>
>>> - Langdon mentions Java support -- well, yes, use/work on SQLite  
>>> JDBC.
>>> I have been using it for a few days now and find it to be a pretty
>>> competent conduit. Extend it, spatialize it. ANSI standard C is  
>>> still
>>> that magic common denominator that compiles and works predictably on
>>> most number of systems. I have a lot against Java, but those who  
>>> love
>>> Java should definitely work on tools for accessing and working with
>>> this new format as it would only make the format more widely used  
>>> and
>>> adopted.
>>>
>>> Ok, enough for now.
>>>
>>>
>>>
>>> On Nov 13, 2007 8:52 AM, P Kishor <punk.kish at gmail.com> wrote:
>>>> So, I am thinking, Shapefile is the de facto data standard for GIS
>>>> data. That it is open (albeit not Free) along with the deep and  
>>>> wide
>>>> presence of ESRI's products from the beginning of the epoch, it has
>>>> been widely adopted. Existence of shapelib, various language  
>>>> bindings,
>>>> and ready use by products such as MapServer has continued to cement
>>>> Shapefile as the format to use. All this is in spite of Shapefile's
>>>> inherent drawbacks, particularly in the area of attribute data
>>>> management.
>>>>
>>>> What if we came up with a new and improved data format -- call it
>>>> "Open Shapefile" (extension .osh) -- that would be completely Free,
>>>> single-file based (instead of the multiple .shp, .dbf, .shx, etc.),
>>>> and based on SQLite, giving the .osh format complete relational  
>>>> data
>>>> handling capabilities. We would require a new version of Shapelib,
>>>> improved language bindings, make it the default and preferred  
>>>> format
>>>> for MapServer, and provide seamless and painless import of regular
>>>> .shp data into .osh for native rendering. Its adoption would be  
>>>> quick
>>>> in the open source community. The non-opensource community would
>>>> either not give a rat's behind for it, but it wouldn't affect  
>>>> them...
>>>> they would still work with their preferred .shp until they learned
>>>> better. By having a completely open and Free single-file based,  
>>>> built
>>>> on SQLite, fully relational dbms capable spatial data format, it  
>>>> would
>>>> be positioned for continued improvement and development.
>>>>
>>>> Is this too crazy?
>>>>
>>>> --
>>>> Puneet Kishor
>>>>
>>> _______________________________________________
>>> Discuss mailing list
>>> Discuss at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/discuss
>>>
>>>
>>> Warning:
>>> Information provided via electronic media is not guaranteed  
>>> against defects including translation and transmission errors. If  
>>> the reader is not the intended recipient, you are hereby notified  
>>> that any dissemination, distribution or copying of this  
>>> communication is strictly prohibited. If you have received this  
>>> information in error, please notify the sender immediately.
>>> _______________________________________________
>>> Discuss mailing list
>>> Discuss at lists.osgeo.org
>>> http://lists.osgeo.org/mailman/listinfo/discuss
>>>
>> _______________________________________________
>> Discuss mailing list
>> Discuss at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/discuss
>>
>
> have fun,
>
> SteveC | steve at asklater.com | http://www.asklater.com/steve/
>
>
> _______________________________________________
> Discuss mailing list
> Discuss at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/discuss

-- 
Allan Doyle
Director of Technology
MIT Museum
+1.617.452.2111







More information about the Discuss mailing list