[Qgis-developer] Delimited text issues and ideas

Chris Crook ccrook at linz.govt.nz
Wed May 22 13:48:42 PDT 2013


Hi Régis

Interesting thoughts.  I've renamed this thread and copied it  in to qgis-developer, as I think it is worth getting broader input  on this.  Hope you don't mind my putting your email there....

If I've understood the main points of your suggestion are:

1) The delimited text provider and the GUI should be able to read a VRT/CSVT file if one is present to determine field types etc

2) The delimited text provider GUI should be able to save settings to a VRT/CSVT

3) The user should be able to explicitly set data types for each column

This is inline with some of my thinking in this.  I was planning to add some way of saving settings as a "delimited text file type", that could then be selected when adding a new text layer.  At the moment (ie in master, not 1.8)  the plugin remembers settings based on file extension, but that doesn't provide enough granularity for me.  This would be analogous to saving styles in QGIS.  I hadn't decided where to store the settings yet (ie whether in a file, or in the QGIS settings, or... )

Like you, I am also not very happy with the way that the provider determines field types by scanning the file when it loads it.  So if you put different data in the file, and reload it into QGIS, then the data types may change.

I hadn't thought about using VRT or CSVT files and I really like the idea.

As a really simple first step, which would not be a major refactoring, the provider could check for a CSVT file when it loads as CSV and use it to determine field types - this would not require any UI or API changes at all.  That on its own could be very useful.  The only difficulty I can foresee with this is how to manage files with names other than ".csv".  Should it just look for a file matching the name of the input file with a "t" at the end?  Or should it only use this options for files that are named ".csv".  Or should it look for ".csvt" whatever the name of the file.  Other than that this really isn't much work, and I'd be keen to implement it.  I guess the simplest approach would be the first options, looking for a file named the same as the data file but with a "t".  If it exists and can be interpreted, then it will be used to define field types.  This would make it really easy to manage when creating data, and would be compatible with GDAL/OGR.

The VRT file is much more work as you suggest.  The mapping between VRT and the CSV options is not complete.  So the options around delimiters, skipped lines, regular expressions, and so on are not available for the VRT file.  Conversely several of the VRT options don't apply within QGIS.  So there is quite a lot of work in specifying how this should work.  Also it would entail a major reworking in terms of how the file is opened (ie you would select the VRT file in the GUI, which would then have to identify the CSV file, and also handle politely VRT files which did not define CSV files, but which defined other data source types.  This seems a lot of work and would end up re-engineering a lot of what is already in OGR.   So (even as I'm writing this)  I'm becoming less clear that this is a good approach.

Returning to the CSVT idea - once the provider can use the CSVT  then there remains the question of what GUI/API changes should be made to support it.   Two thoughts come to mind immediately.

One is, should the CSVT idea be extended to support the other metadata information required in setting up the file, such as the delimiter, etc.  The OGR specification for CSVT just defines the field types in the first line.  I don't know if it would ignore subsequent lines, in which case additional metadata could go there and still be compatible with OGR usage.  Or should another metadata file (eg .metadata, .qgs, .dlt, ...) be used to hold all the information specifying how the file should be used (sounds really messy).  But it would be really nice to be able to just select a file and have all these options automatically populated if the metadata file existed.

The other thought is around your suggestion of writing the CSVT/metadata file.  The main extra work involved in this is the user interface for defining field types.

The dialog box is already quite busy, but I guess a simple approach would be just to add a row to the preview box under the column headings with a field type selector for each column, and values "Auto,Text,Integer,Real,Date,Time,DateTime", or something like that.  The field types could then be passed through to the provider in the datasource URI (ie with a parameter such as "fieldtypes=text,text,integer,...").   This also doesn't sound like too much work.

Once this is done it would be simple to add a "save settings to metadata file" type button to the GUI, which could write the CSVT/metadata file.

This would create one more tricky question, of how to handle conflicts between metadata read in a CSVT/metadata file and that in the datasource URI.

Enough rambling.  I expect it will be a couple of weeks before I can consider this much more (though I may consider handling the CSVT file sooner).

Cheers
Chris


> -----Original Message-----
> From: HAUBOURG [mailto:regis.haubourg at eau-adour-garonne.fr]
> Sent: Thursday, 23 May 2013 6:45 a.m.
> To: Chris Crook
> Subject: RE : Delimited text debug
>
> Thanks for your feedback Chris.
>
> I have been thinking of it all day, and got to the following observations and
> conclusions:
>
> 1- there are two concurrent ways to open csv in qgis: ogr native and your
> plugin.
> ogr gdal offers some features like vrt (enabling geometry columns, xy, yx
> columns) and csvt  (basic types for columns) that remain unused in qgis
> (unless your using them for your code).
>
> 2- users have no way to create point from attribute datas, except using your
> plugin. csv export and fields types can be a pain.
>
> 3- there is no way to change a data type on the fly, so user has to do again
> the import, and sometime is trapped if no ETL or database is available or
> understood.
>
> From a user point of view, I think we should do two things:
>
> A: unify import for data sources to avoid the two different entries
>  - merge all import tools based on your approach, with a previz gui (choose
> encoding, skip lines...)
>  - enable others options for all text based files (field delimiter, text delimiter,
> decimal delimiter, trim fields.. )
>  - WKT or XY chooser for geometry fields (for all data sources: native
> geometry / no geometry/ text fields)
>  - field type chooser with automatic guess (gdal does it)
>  - option to save a vrt / csvt so that a user can reopen easily the data without
> redoing all the import stuff.
> This is a big refactoring of vector layer add dialog. Nathan add some mockups
> for this that could do.
>
>  B: add a vector tool to create geometry from any attribute data (xy, wkt.. ) of
> any loaded data source. users that imported data could then spatialize data in
> a second step (like Mapinfo does)
>
> In my corp users really need that, so I probably will fund that. Do you have
> some feedback on that?
> Régis
> ________________________________________
> De : Chris Crook [ccrook at linz.govt.nz]
> Date d'envoi : mercredi 22 mai 2013 19:54 À : HAUBOURG Objet : RE:
> Delimited text debug
>
> Hi Régis
>
> It could be a useful improvement - basically to allow setting types of columns.
> Associated with this I'd like to add date types, which would require some
> explicit definition by the user.  This will not make 2.0, but certainly worth
> doing for the next release.
>
> As a workaround for the moment could you add an extra row of dummy data
> with a non-numeric value in the key column.  The provider will then treat it as
> a text column and the joining should work ok.
>
> Cheers
> Chris
> ________________________________________
> From: HAUBOURG [regis.haubourg at eau-adour-garonne.fr]
> Sent: 22 May 2013 23:12
> To: Chris Crook
> Subject: RE: Delimited text debug
>
> Hi Chris,
> I'm facing a problem here. We have most of our administrative area
> identified with a text key, but composed only of number ("09 ", "31").
> I have no way to choose to interpret text delimiters, and then, data is
> corrupted (09 becomes 9) and no way to join data with geographic layer..
> Is that a possible improvement to your plugin? I will file a ticket if needed.
> Cheers,
> Régis
>
> This message contains information, which is confidential and may be subject
> to legal privilege. If you are not the intended recipient, you must not peruse,
> use, disseminate, distribute or copy this message. If you have received this
> message in error, please notify us immediately (Phone 0800 665 463 or
> info at linz.govt.nz) and destroy the original message. LINZ accepts no
> responsibility for changes to this email, or for any attachments, after its
> transmission from LINZ. Thank You.


This message contains information, which is confidential and may be subject to legal privilege. If you are not the intended recipient, you must not peruse, use, disseminate, distribute or copy this message. If you have received this message in error, please notify us immediately (Phone 0800 665 463 or info at linz.govt.nz) and destroy the original message. LINZ accepts no responsibility for changes to this email, or for any attachments, after its transmission from LINZ. Thank You.


More information about the Qgis-developer mailing list