[Java-collab] Simple Text Exchange Format
Paul Austin
mail-lists at revolsys.com
Wed May 28 11:14:07 EDT 2008
Fran,
I have thought about all these issues
1. UTF-8 is the mandatory encoding, for most documents it doesn't add
a lot of overhead so I decided to skip the complexity of multiple
character sets and the detection mechanisms you'd need
2. There will be a standard set of data types (applications can add
custom ones). The standard set will use the XML data types and
dates will be a standard format
3. As for the index file, this would not be part of the main
specification but we could add a ECSV profile for an index file.
This would just be an ECSV file with one attribute which is the
offest in another file. I personally won't use this kind of file
much as most of the time I'll be generating ECSV files on the fly
from my restful web service that I'm also developing (more on that
when I have the ECSV spec finished). This will only work for
random access, random update won't work to well as it is unlikely
that the new record would have the same length as the old one. The
format is primarily for exchange of data. Tools which read/write
these files will need to do a complete replace of the file if they
update it. Much the same way Excel would deal with CSV files.
4. For the file level metadata I'm putting it in the ECSV file
because then the document is self describing, so you only need the
one file to know all you need to know. This makes it easy to work
with web services.
5. Individual ECSV files can be compressed for transport using gz
compression either over a HTTP connection (using the
accept-encoding HTTP header) or the file itself (.ecsv.gz)
6. Multiple ECSV files can also distributed as a ZIP file, here there
are no rules relating to the files. A recommendation would be to
use folders for each namespace and the file name should be the
type name with a .ecsv extension. The real type name is loaded
from the file so the file names can differ so as to work within
the characters supported by file names
My intent is to make the specification very strict in terms of what is
allowed for the format of the file. At the same time it will allow
extensions of new data types, new file properties and new attribute headers.
Paul
Francisco José Peñarrubia wrote:
> Hi.
>
> About this format, I would recommend some suggestions:
> 1.- Take care about encoding.
> 2.- At least, set a default date / time format or specify also in header.
> 3.- If you include an index file (for example, longs with the position
> in file where each record starts, it may allow you to spatially index
> the file, random access and edit each record without rewriting the
> whole file. Even to set a order (to put a feature below other, for
> example).
>
> About metadata, I think it will be better an already existent format,
> but I'm not sure. And maybe a good idea would be to allow distribution
> of those files in .zip file.
>
>
> Thanks for your formats, Paul and Landon. Good to see people with this
> kind of interests here.
>
> PS: Anyone has compared SQLite and H2?. (Spatial)
>
> Fran.
> gvSIG team.
>
> Paul Austin escribió:
>> All,
>>
>> I saw in one of the other posts there was a discussion of binary
>> format to replace shape files quick random access to data. Someone
>> suggested using an embedded database such as H2 with a spatial
>> extension. I think that using a database is a much better way to go
>> for this kind of access. Otherwise if we come up with our own binary
>> format we'll need to deal with all the issues such as storage
>> management and indexing that databases already do for us.
>>
>> I do however think that we need a simple format for exchange of data.
>> Exchanging data may be via files or via a web service. GML in my view
>> is very verbose and complex to read and write and does not include an
>> embedded schema.
>>
>> I have been working on a CSV derivative which I'm calling
>> Enhanced-CSV. Basically it's a CSV file where the format is strict
>> about placement of commas and use of "". It also has two header
>> sections. The first section is a list of properties about the file,
>> such as type name, projection, author and a list of which attribute
>> headers will follow. The next header is the attribute header
>> (schema). There can be multiple attribute headers including the
>> name,type, length, precision, required flag of the attribute. There
>> is one entry for each data column (attribute). Finally there is the
>> data section which is just all your rows of data encoded as CSV.
>> Geometries are encoded as WKT
>>
>> Below is a sample of a ECSV file with the three sections.
>>
>> {http://ns.ecsv.org/ecsv}typeName,QName,{GFT}GFT_CAPTURE_METHOD_CODE
>> {http://ns.ecsv.org/ecsv}srid,QName,{http://epsg.org}3005
>> {http://ns.ecsv.org/ecsv}attributeHeaderTypes,list,"{http://ns.ecsv.org/ecsv}attributeName,{http://ns.ecsv.org/ecsv}attributeType,{http://ns.ecsv.org/ecsv}attributeLength,{http://ns.ecsv.org/ecsv}attributeScale,{http://ns.ecsv.org/ecsv}attributeRequired"
>>
>>
>> CAPTURE_METHOD_CODE_ID,CODE_VALUE,WHO_CREATED,WHEN_CREATED
>> integer,string,string,dateTime
>> 3,255,255,2147483647
>> 0,0,0,0
>> false,false,false,false
>>
>> 1,Photogrammetric,PROXY_GFT,2008-05-26T00:00:00
>> 2,Differential Gps,PROXY_GFT,2008-05-26T00:00:00
>> 3,Tablet Digitizing,PROXY_GFT,2008-05-26T00:00:00
>>
>>
>> I'm working on a specification for this format and hopefully should
>> have a draft up in the next month or so. I have developed a reader
>> and writer and a JUMP plug-in which I'll make available when I've
>> finalized the specification.
>>
>> Is this something that would interest any one else?
>>
>> Paul
>> _______________________________________________
>> Java-collab mailing list
>> Java-collab at lists.osgeo.org
>> http://lists.osgeo.org/mailman/listinfo/java-collab
>
> _______________________________________________
> Java-collab mailing list
> Java-collab at lists.osgeo.org
> http://lists.osgeo.org/mailman/listinfo/java-collab
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.osgeo.org/pipermail/java-collab/attachments/20080528/5b204afb/attachment-0001.html
More information about the Java-collab
mailing list