[gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns

Robert Hewlett rob.hewy at gmail.com
Thu May 4 06:31:17 PDT 2023


Hi.

Here is an image of your dataframe.shp converted to CSV displayed in QGIS:

https://i.imgur.com/6f6VFNf.png

The dot CSV file in Excel and Calc
https://i.imgur.com/twmfIPp.png

On Thu, May 4, 2023 at 6:07 AM Robert Hewlett <rob.hewy at gmail.com> wrote:

> Hi,
>
> Traced back through the thread and found your sample data set and was able
> to convert it to a CSV file that both Calc, Excel and QGIS loaded without
> issue.
>
> The command I used the three files
> ogr2ogr -f CSV df.csv dataframe.shp -lco CREATE_CSVT=YES -lco
> GEOMETRY=AS_WKT -lco GEOMETRY_NAME=geom
>
> The redirect > result.csv is potentially why you have one, all out (3
> files) put into one file.
>
> The column name in the csv file is geom and not WKT.
>
> I will try to post some images.
>
> The command was run on a windows 10 Laptop.
>
>
>
>
>
>
> On Thu, May 4, 2023 at 4:59 AM Moises Calzado via gdal-dev <
> gdal-dev at lists.osgeo.org> wrote:
>
>> Hi Robert!
>>
>> I think that we're losing a bit the main issue that we reported, as in
>> fact the problem is related with line breaks in the output generated while
>> using /vsistdout and the CREATE_CSVT=YES option.
>>
>> Even pointed out that avoiding that flag it works as expected, but when
>> it's used the generated output is not okay as the "Fields with embedded
>> line breaks must be quoted" rule is not followed.
>> IMHO although the generated output is not a CSV itself, we should be able
>> to delete the first two lines (projection info and types) and deal with the
>> rest of the content as a CSV.
>>
>> What we're doing is streaming the output of the /vsistdout driver to
>> another process that perform some steps with the resultant CSV. In all
>> cases it works correctly, as the output of the ogr2ogr execution is a valid
>> CSV when deleting the first two lines, but in the case reported in my first
>> email it's not.
>> The CREATE_CSVT=YES option is mandatory for us as for the moment, it's
>> requires to use the GEOMETRY_NAME=*geom *one, so we don't have any
>> workaround.
>>
>> Just wanted to confirm if that's expected for you (generating an output
>> that it's not a valid CSV in the end)!
>>
>> El mié, 3 may 2023 a las 21:05, Robert Hewlett (<rob.hewy at gmail.com>)
>> escribió:
>>
>>> Hi,
>>>
>>> I just tested with : GDAL 3.6.4, released 2023/04/17
>>>
>>> Using the ogr2ogr as follows:
>>> ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES
>>> I get three files but no geometry
>>>
>>> ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco
>>> GEOMETRY=AS_WKT
>>> I get three file with the geometry as WKT with the column name WKT
>>>
>>> *WKT*,id,poi_name,poi_types
>>> "POINT (508878.602179846 5433913.2763688)","1",crescent,"4"
>>> "POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional
>>> Park,"1"
>>>
>>> ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco
>>> GEOMETRY=AS_WKT -lco GEOMETRY_NAME=*geom*
>>> I get three file with the geometry as WKT but the column called  *geom*
>>> *geom*,id,poi_name,poi_types
>>> "POINT (508878.602179846 5433913.2763688)","1",crescent,"4"
>>> "POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional
>>> Park,"1"
>>>
>>> What does
>>> *ogr2ogr --version *
>>> report back
>>>
>>>
>>>
>>> On Wed, May 3, 2023 at 9:38 AM Robert Hewlett <rob.hewy at gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> Not to start a controversy but it feels like the standard hints at
>>>> three files. Did the standard change?
>>>>
>>>> If it is three files which works for me in QGIS and geopandas i.e. data
>>>> lands where it is suppose to, then more layer creations options are needed
>>>> to handle the SRID/CRS
>>>>
>>>> CREATE_PRJ=YES/NO
>>>> or -t_srs and/or -s_srs triggers the dot-prj file being created.
>>>>
>>>> Just saying 😊.
>>>>
>>>> In the meantime would a short python script help parse the one file
>>>> into three?
>>>>
>>>>
>>>> On Wed, May 3, 2023 at 9:16 AM Moises Calzado via gdal-dev <
>>>> gdal-dev at lists.osgeo.org> wrote:
>>>>
>>>>> Hi Robert,
>>>>>
>>>>> Yes, we're getting one with all the info!
>>>>>
>>>>> El mié, 3 may 2023 a las 18:14, Robert Hewlett (<rob.hewy at gmail.com>)
>>>>> escribió:
>>>>>
>>>>>> Just to clarify, instead of getting three files you are getting one
>>>>>> with all the info: types, projection, data?
>>>>>>
>>>>>> https://giswiki.hsr.ch/GeoCSV
>>>>>>
>>>>>> On Wed, May 3, 2023 at 8:57 AM Moises Calzado via gdal-dev <
>>>>>> gdal-dev at lists.osgeo.org> wrote:
>>>>>>
>>>>>>> We're also specifying the GEOM_POSSIBLE_NAMES, so it would be great
>>>>>>> if with that option we could use the GEOMETRY_NAME without using the
>>>>>>> CREATE_CSVT=YES option.
>>>>>>>
>>>>>>> Regarding emitting the .prj and .csvt in /vsistdout mode, that's why
>>>>>>> I'm saying that there is an issue while generating the resultant CSV.
>>>>>>> The way we see it is that when using the /vsistdout mode, the result
>>>>>>> is a CSV file with the .prj information in the first line, and the .csvt in
>>>>>>> the second line. We're dealing with the result deleting the first two lines
>>>>>>> and using the rest of the content as a CSV, which should be equal to the
>>>>>>> result obtained when using ogr2ogr without the CREATE_CSVT=YES option.
>>>>>>> Probably we're losing something, but as we see it, the generated CSV
>>>>>>> should be a valid one. Does that make sense?
>>>>>>>
>>>>>>> Thanks so much for your help!
>>>>>>>
>>>>>>> El mié, 3 may 2023 a las 15:10, Robert Hewlett (<rob.hewy at gmail.com>)
>>>>>>> escribió:
>>>>>>>
>>>>>>>> The .CSVT and .PRJ help to make a proper geocsv dataset. Helps with
>>>>>>>> QGIS And geopandas. The column name that I use in the CSV is usually geom
>>>>>>>> and WKT shows up in the CSVT file which seems to be a one line file that
>>>>>>>> hints at the data types in the CSV file.
>>>>>>>>
>>>>>>>> I hope that makes sense.
>>>>>>>>
>>>>>>>> CSVT
>>>>>>>> Integer, Integer,WKT
>>>>>>>>
>>>>>>>> CSV
>>>>>>>> line_id,point_id,geom
>>>>>>>> 1,1,"POINT(1000 1000)"
>>>>>>>>
>>>>>>>> PRJ
>>>>>>>> EPSG:26910
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, May 3, 2023, 05:23 Moises Calzado via gdal-dev <
>>>>>>>> gdal-dev at lists.osgeo.org> wrote:
>>>>>>>>
>>>>>>>>> Hi Even,
>>>>>>>>>
>>>>>>>>> Thanks so much for taking a look into that one!
>>>>>>>>>
>>>>>>>>> I have one doubt regarding the CSVT content, as we're not really
>>>>>>>>> using it, but it's required when using the GEOMETRY_NAME layer creation
>>>>>>>>> option, as can be checked in the CSV driver documentation:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>    -
>>>>>>>>>>
>>>>>>>>>>    GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>>>>>>>>>>    column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>>>>>>>>>>
>>>>>>>>>> We really need this flag as we are processing files that contain
>>>>>>>>> geometries with different column names, and we always want the same
>>>>>>>>> geometry name in the generated output. Are we losing something when using
>>>>>>>>> that flag to avoid this problem?
>>>>>>>>> In my humble opinion, generating an invalid CSV when using the
>>>>>>>>> -lco CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
>>>>>>>>> strings containing line breaks can't be quoted.
>>>>>>>>>
>>>>>>>>> Could you please shed some light on this?
>>>>>>>>>
>>>>>>>>> Looking forward to your reply,
>>>>>>>>> Regards.
>>>>>>>>>
>>>>>>>>> El mié, 3 may 2023 a las 14:00, Even Rouault (<
>>>>>>>>> even.rouault at spatialys.com>) escribió:
>>>>>>>>>
>>>>>>>>>> you didn't post to the list
>>>>>>>>>> Le 03/05/2023 à 13:49, Moises Calzado a écrit :
>>>>>>>>>>
>>>>>>>>>> Hi Even,
>>>>>>>>>>
>>>>>>>>>> Thanks so much for taking a look into that one!
>>>>>>>>>>
>>>>>>>>>> I have one doubt regarding the CSVT content, as we're not really
>>>>>>>>>> using it, but it's required when using the GEOMETRY_NAME layer creation
>>>>>>>>>> option, as can be checked in the CSV driver documentation:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>    -
>>>>>>>>>>>
>>>>>>>>>>>    GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of
>>>>>>>>>>>    geometry column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults
>>>>>>>>>>>    to WKT
>>>>>>>>>>>
>>>>>>>>>>> We really need this flag as we are processing files that contain
>>>>>>>>>> geometries with different column names, and we always want the same
>>>>>>>>>> geometry name in the generated output. Are we losing something when using
>>>>>>>>>> that flag to avoid this problem?
>>>>>>>>>> In my humble opinion, generating an invalid CSV when using the
>>>>>>>>>> -lco CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
>>>>>>>>>> strings containing line breaks can't be quoted.
>>>>>>>>>>
>>>>>>>>>> Could you please shed some light on this?
>>>>>>>>>>
>>>>>>>>>> Looking forward to your reply,
>>>>>>>>>> Regards.
>>>>>>>>>>
>>>>>>>>>> El sáb, 29 abr 2023 a las 15:44, Even Rouault (<
>>>>>>>>>> even.rouault at spatialys.com>) escribió:
>>>>>>>>>>
>>>>>>>>>>> Moises,
>>>>>>>>>>>
>>>>>>>>>>> as far as I can see with your example, the CSV driver behaves
>>>>>>>>>>> "properly" in reading and writing of field values with line breaks.
>>>>>>>>>>>
>>>>>>>>>>> It follows the "Fields with embedded line breaks must be quoted"
>>>>>>>>>>> rule of https://en.wikipedia.org/wiki/Comma-separated_values
>>>>>>>>>>>
>>>>>>>>>>> $ ogr2ogr out.csv /vsizip/dataframe.zip
>>>>>>>>>>>
>>>>>>>>>>> $ cat out.csv
>>>>>>>>>>> id,descriptio
>>>>>>>>>>> "1",This is my third row
>>>>>>>>>>> "2","this is
>>>>>>>>>>> my string
>>>>>>>>>>> "
>>>>>>>>>>> "3",This is my third row
>>>>>>>>>>>
>>>>>>>>>>> $ ogrinfo out.csv -al
>>>>>>>>>>> INFO: Open of `out.csv'
>>>>>>>>>>>       using driver `CSV' successful.
>>>>>>>>>>>
>>>>>>>>>>> Layer name: out
>>>>>>>>>>> Geometry: None
>>>>>>>>>>> Feature Count: 3
>>>>>>>>>>> Layer SRS WKT:
>>>>>>>>>>> (unknown)
>>>>>>>>>>> id: String (0.0)
>>>>>>>>>>> descriptio: String (0.0)
>>>>>>>>>>> OGRFeature(out):1
>>>>>>>>>>>   id (String) = 1
>>>>>>>>>>>   descriptio (String) = This is my third row
>>>>>>>>>>>
>>>>>>>>>>> OGRFeature(out):2
>>>>>>>>>>>   id (String) = 2
>>>>>>>>>>>   descriptio (String) = this is
>>>>>>>>>>> my string
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> OGRFeature(out):3
>>>>>>>>>>>   id (String) = 3
>>>>>>>>>>>   descriptio (String) = This is my third row
>>>>>>>>>>>
>>>>>>>>>>> But in your example using /vsistdout/ and -lco CREATE_CSVT=YES
>>>>>>>>>>> is going to result in an invalid CSV file which will mix both the .csvt and
>>>>>>>>>>> .csv content
>>>>>>>>>>>
>>>>>>>>>>> Even
>>>>>>>>>>> Le 24/04/2023 à 13:34, Moises Calzado via gdal-dev a écrit :
>>>>>>>>>>>
>>>>>>>>>>> Hello!
>>>>>>>>>>>
>>>>>>>>>>> We're trying to convert a Shapefile into a CSV using ogr2ogr and
>>>>>>>>>>> we're having some issues while dealing with some columns that contain line
>>>>>>>>>>> breaks inside their values. If we have a line with the following string,
>>>>>>>>>>> ogr2ogr detects that the line break is a new line and it returns two lines.
>>>>>>>>>>>
>>>>>>>>>>> "this is my \n value"
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> That's the command that we're executing:
>>>>>>>>>>>
>>>>>>>>>>> ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/
>>>>>>>>>>>> /vsizip/shapefile.zip -simplify 0.00001 -dim XY -t_srs EPSG:4326 -lco
>>>>>>>>>>>> GEOMETRY=AS_WKT -lco GEOMETRY_NAME=geom -lco CREATE_CSVT=YES > result.csv
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Is this an expected behaviour, or is there any way to avoid this?
>>>>>>>>>>> Sharing an example Shapefile so that you can try to reproduce
>>>>>>>>>>> that behaviour:
>>>>>>>>>>> https://drive.google.com/file/d/1gFqfTP02KTFoavJyyO-Ix05YwZB2tS24/view?usp=sharing
>>>>>>>>>>>
>>>>>>>>>>> Thanks so much in advance,
>>>>>>>>>>> Regards.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> *Moises Calzado*
>>>>>>>>>>>
>>>>>>>>>>> Support Engineer
>>>>>>>>>>>
>>>>>>>>>>> +34671264286 | mcalzado at carto.com | CARTO
>>>>>>>>>>> <https://www.carto.com/>
>>>>>>>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>>>>>>>
>>>>>>>>>>> -- http://www.spatialys.com
>>>>>>>>>>> My software is free, but my time generally not.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Moises Calzado*
>>>>>>>>>>
>>>>>>>>>> Support Engineer
>>>>>>>>>>
>>>>>>>>>> +34671264286 | mcalzado at carto.com | CARTO
>>>>>>>>>> <https://www.carto.com/>
>>>>>>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>>>>>>>
>>>>>>>>>> -- http://www.spatialys.com
>>>>>>>>>> My software is free, but my time generally not.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Moises Calzado*
>>>>>>>>>
>>>>>>>>> Support Engineer
>>>>>>>>>
>>>>>>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>>>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>>>>>> _______________________________________________
>>>>>>>>> gdal-dev mailing list
>>>>>>>>> gdal-dev at lists.osgeo.org
>>>>>>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> gdal-dev mailing list
>>>>>>>> gdal-dev at lists.osgeo.org
>>>>>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Moises Calzado*
>>>>>>>
>>>>>>> Support Engineer
>>>>>>>
>>>>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>>>> _______________________________________________
>>>>>>> gdal-dev mailing list
>>>>>>> gdal-dev at lists.osgeo.org
>>>>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>>>
>>>>>> _______________________________________________
>>>>>> gdal-dev mailing list
>>>>>> gdal-dev at lists.osgeo.org
>>>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> *Moises Calzado*
>>>>>
>>>>> Support Engineer
>>>>>
>>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>> _______________________________________________
>>>>> gdal-dev mailing list
>>>>> gdal-dev at lists.osgeo.org
>>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>
>>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>
>>
>>
>> --
>> *Moises Calzado*
>>
>> Support Engineer
>>
>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>> <https://spatial-data-science-conference.com/2023/london/>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230504/6822d072/attachment-0001.htm>


More information about the gdal-dev mailing list