<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html>

<head>

  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">

  <title></title>

</head>

<body bgcolor="#ffffff" text="#000000">

Fran,<br>

<br>

I have thought about all these issues<br>

<ol>

  <li>UTF-8 is the mandatory encoding, for most documents it doesn't

add a lot of overhead so I decided to skip the complexity of multiple

character sets and the detection mechanisms you'd need</li>

  <li>There will be a standard set of data types (applications can add

custom ones). The standard set will use the XML data types and dates

will be a standard format</li>

  <li>As for the index file, this would not be part of the main

specification but we could add a ECSV profile for an index file. This

would just be an ECSV file with one attribute which is the offest in

another file. I personally won't use this kind of file much as most of

the time I'll be generating ECSV files on the fly from my restful web

service that I'm also developing (more on that when I have the ECSV

spec finished). This will only work for random access, random update

won't work to well as it is unlikely that the new record would have the

same length as the old one. The format is primarily for exchange of

data. Tools which read/write these files will need to do a complete

replace of the file if they update it. Much the same way Excel would

deal with CSV files.<br>

  </li>

  <li>For the file level metadata I'm putting it in the ECSV file

because then the document is self describing, so you only need the one

file to know all you need to know. This makes it easy to work with web

services.</li>

  <li>Individual ECSV files can be compressed for transport using gz

compression either over a HTTP connection (using the accept-encoding

HTTP header) or the file itself (.ecsv.gz)</li>

  <li>Multiple ECSV files can also distributed as a ZIP file, here

there are no rules relating to the files. A recommendation would be to

use folders for each namespace and the file name should be the type

name with a .ecsv extension. The real type name is loaded from the file

so the file names can differ so as to work within the characters

supported by file names<br>

  </li>

</ol>

My intent is to make the specification very strict in terms of what is

allowed for the format of the file. At the same time it will allow

extensions of new data types, new file properties and new attribute

headers.<br>

<br>

Paul<br>

<br>

Francisco Jos&eacute; Pe&ntilde;arrubia wrote:

<blockquote cite="mid:483D10B0.5030808@scolab.es" type="cite">Hi.

  <br>

  <br>

About this format, I would recommend some suggestions:

  <br>

1.- Take care about encoding.

  <br>

2.- At least, set a default date / time format or specify also in

header.

  <br>

3.- If you include an index file (for example, longs with the position

in file where each record starts, it may allow you to spatially index

the file, random access and edit each record without rewriting the

whole file. Even to set a order (to put a feature below other, for

example).

  <br>

  <br>

About metadata, I think it will be better an already existent format,

but I'm not sure. And maybe a good idea would be to allow distribution

of those files in .zip file.

  <br>

  <br>

  <br>

Thanks for your formats, Paul and Landon. Good to see people with this

kind of interests here.

  <br>

  <br>

PS: Anyone has compared SQLite and H2?. (Spatial)

  <br>

  <br>

Fran.

  <br>

gvSIG team.

  <br>

  <br>

Paul Austin escribi&oacute;:

  <br>

  <blockquote type="cite">All,

    <br>

    <br>

I saw in one of the other posts there was a discussion of binary format

to replace shape files quick random access to data. Someone suggested

using an embedded database such as H2 with a spatial extension. I think

that using a database is a much better way to go for this kind of

access. Otherwise if we come up with our own binary format we'll need

to deal with all the issues such as storage management and indexing

that databases already do for us.

    <br>

    <br>

I do however think that we need a simple format for exchange of data.

Exchanging data may be via files or via a web service. GML in my view

is very verbose and complex to read and write and does not include an

embedded schema.

    <br>

    <br>

I have been working on a CSV derivative which I'm calling Enhanced-CSV.

Basically it's a CSV file where the format is strict about placement of

commas and use of "". It also has two header sections. The first

section is a list of properties about the file, such as type name,

projection, author and a list of which attribute headers will follow.

The next header is the attribute header (schema). There can be multiple

attribute headers including the name,type, length, precision, required

flag of the attribute. There is one entry for each data column

(attribute). Finally there is the data section which is just all your

rows of data encoded as CSV. Geometries are encoded as WKT

    <br>

    <br>

Below is a sample of a ECSV file with the three sections.

    <br>

    <br>

{<a class="moz-txt-link-freetext" href="http://ns.ecsv.org/ecsv">http://ns.ecsv.org/ecsv</a>}typeName,QName,{GFT}GFT_CAPTURE_METHOD_CODE

    <br>

{<a class="moz-txt-link-freetext" href="http://ns.ecsv.org/ecsv">http://ns.ecsv.org/ecsv</a>}srid,QName,{<a class="moz-txt-link-freetext" href="http://epsg.org">http://epsg.org</a>}3005

    <br>

{<a class="moz-txt-link-freetext" href="http://ns.ecsv.org/ecsv">http://ns.ecsv.org/ecsv</a>}attributeHeaderTypes,list,"{<a class="moz-txt-link-freetext" href="http://ns.ecsv.org/ecsv">http://ns.ecsv.org/ecsv</a>}attributeName,{<a class="moz-txt-link-freetext" href="http://ns.ecsv.org/ecsv">http://ns.ecsv.org/ecsv</a>}attributeType,{<a class="moz-txt-link-freetext" href="http://ns.ecsv.org/ecsv">http://ns.ecsv.org/ecsv</a>}attributeLength,{<a class="moz-txt-link-freetext" href="http://ns.ecsv.org/ecsv">http://ns.ecsv.org/ecsv</a>}attributeScale,{<a class="moz-txt-link-freetext" href="http://ns.ecsv.org/ecsv">http://ns.ecsv.org/ecsv</a>}attributeRequired"

    <br>

    <br>

CAPTURE_METHOD_CODE_ID,CODE_VALUE,WHO_CREATED,WHEN_CREATED

    <br>

integer,string,string,dateTime

    <br>

3,255,255,2147483647

    <br>

0,0,0,0

    <br>

false,false,false,false

    <br>

    <br>

1,Photogrammetric,PROXY_GFT,2008-05-26T00:00:00

    <br>

2,Differential Gps,PROXY_GFT,2008-05-26T00:00:00

    <br>

3,Tablet Digitizing,PROXY_GFT,2008-05-26T00:00:00

    <br>

    <br>

    <br>

I'm working on a specification for this format and hopefully should

have a draft up in the next month or so. I have developed a reader and

writer and a JUMP plug-in which I'll make available when I've finalized

the specification.

    <br>

    <br>

Is this something that would interest any one else?

    <br>

    <br>

Paul

    <br>

_______________________________________________

    <br>

Java-collab mailing list

    <br>

<a class="moz-txt-link-abbreviated" href="mailto:Java-collab@lists.osgeo.org">Java-collab@lists.osgeo.org</a>

    <br>

<a class="moz-txt-link-freetext" href="http://lists.osgeo.org/mailman/listinfo/java-collab">http://lists.osgeo.org/mailman/listinfo/java-collab</a>

    <br>

  </blockquote>

  <br>

_______________________________________________

  <br>

Java-collab mailing list

  <br>

<a class="moz-txt-link-abbreviated" href="mailto:Java-collab@lists.osgeo.org">Java-collab@lists.osgeo.org</a>

  <br>

<a class="moz-txt-link-freetext" href="http://lists.osgeo.org/mailman/listinfo/java-collab">http://lists.osgeo.org/mailman/listinfo/java-collab</a>

  <br>

</blockquote>

</body>

</html>