[gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns

Robert Hewlett rob.hewy at gmail.com
Wed May 3 09:13:29 PDT 2023


Just to clarify, instead of getting three files you are getting one with
all the info: types, projection, data?

https://giswiki.hsr.ch/GeoCSV

On Wed, May 3, 2023 at 8:57 AM Moises Calzado via gdal-dev <
gdal-dev at lists.osgeo.org> wrote:

> We're also specifying the GEOM_POSSIBLE_NAMES, so it would be great if
> with that option we could use the GEOMETRY_NAME without using the
> CREATE_CSVT=YES option.
>
> Regarding emitting the .prj and .csvt in /vsistdout mode, that's why I'm
> saying that there is an issue while generating the resultant CSV.
> The way we see it is that when using the /vsistdout mode, the result is a
> CSV file with the .prj information in the first line, and the .csvt in the
> second line. We're dealing with the result deleting the first two lines and
> using the rest of the content as a CSV, which should be equal to the result
> obtained when using ogr2ogr without the CREATE_CSVT=YES option.
> Probably we're losing something, but as we see it, the generated CSV
> should be a valid one. Does that make sense?
>
> Thanks so much for your help!
>
> El mié, 3 may 2023 a las 15:10, Robert Hewlett (<rob.hewy at gmail.com>)
> escribió:
>
>> The .CSVT and .PRJ help to make a proper geocsv dataset. Helps with QGIS
>> And geopandas. The column name that I use in the CSV is usually geom and
>> WKT shows up in the CSVT file which seems to be a one line file that hints
>> at the data types in the CSV file.
>>
>> I hope that makes sense.
>>
>> CSVT
>> Integer, Integer,WKT
>>
>> CSV
>> line_id,point_id,geom
>> 1,1,"POINT(1000 1000)"
>>
>> PRJ
>> EPSG:26910
>>
>>
>>
>>
>> On Wed, May 3, 2023, 05:23 Moises Calzado via gdal-dev <
>> gdal-dev at lists.osgeo.org> wrote:
>>
>>> Hi Even,
>>>
>>> Thanks so much for taking a look into that one!
>>>
>>> I have one doubt regarding the CSVT content, as we're not really using
>>> it, but it's required when using the GEOMETRY_NAME layer creation option,
>>> as can be checked in the CSV driver documentation:
>>>
>>>
>>>>    -
>>>>
>>>>    GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>>>>    column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>>>>
>>>> We really need this flag as we are processing files that contain
>>> geometries with different column names, and we always want the same
>>> geometry name in the generated output. Are we losing something when using
>>> that flag to avoid this problem?
>>> In my humble opinion, generating an invalid CSV when using the -lco
>>> CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
>>> strings containing line breaks can't be quoted.
>>>
>>> Could you please shed some light on this?
>>>
>>> Looking forward to your reply,
>>> Regards.
>>>
>>> El mié, 3 may 2023 a las 14:00, Even Rouault (<
>>> even.rouault at spatialys.com>) escribió:
>>>
>>>> you didn't post to the list
>>>> Le 03/05/2023 à 13:49, Moises Calzado a écrit :
>>>>
>>>> Hi Even,
>>>>
>>>> Thanks so much for taking a look into that one!
>>>>
>>>> I have one doubt regarding the CSVT content, as we're not really using
>>>> it, but it's required when using the GEOMETRY_NAME layer creation option,
>>>> as can be checked in the CSV driver documentation:
>>>>
>>>>
>>>>>    -
>>>>>
>>>>>    GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>>>>>    column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>>>>>
>>>>> We really need this flag as we are processing files that contain
>>>> geometries with different column names, and we always want the same
>>>> geometry name in the generated output. Are we losing something when using
>>>> that flag to avoid this problem?
>>>> In my humble opinion, generating an invalid CSV when using the -lco
>>>> CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
>>>> strings containing line breaks can't be quoted.
>>>>
>>>> Could you please shed some light on this?
>>>>
>>>> Looking forward to your reply,
>>>> Regards.
>>>>
>>>> El sáb, 29 abr 2023 a las 15:44, Even Rouault (<
>>>> even.rouault at spatialys.com>) escribió:
>>>>
>>>>> Moises,
>>>>>
>>>>> as far as I can see with your example, the CSV driver behaves
>>>>> "properly" in reading and writing of field values with line breaks.
>>>>>
>>>>> It follows the "Fields with embedded line breaks must be quoted" rule
>>>>> of https://en.wikipedia.org/wiki/Comma-separated_values
>>>>>
>>>>> $ ogr2ogr out.csv /vsizip/dataframe.zip
>>>>>
>>>>> $ cat out.csv
>>>>> id,descriptio
>>>>> "1",This is my third row
>>>>> "2","this is
>>>>> my string
>>>>> "
>>>>> "3",This is my third row
>>>>>
>>>>> $ ogrinfo out.csv -al
>>>>> INFO: Open of `out.csv'
>>>>>       using driver `CSV' successful.
>>>>>
>>>>> Layer name: out
>>>>> Geometry: None
>>>>> Feature Count: 3
>>>>> Layer SRS WKT:
>>>>> (unknown)
>>>>> id: String (0.0)
>>>>> descriptio: String (0.0)
>>>>> OGRFeature(out):1
>>>>>   id (String) = 1
>>>>>   descriptio (String) = This is my third row
>>>>>
>>>>> OGRFeature(out):2
>>>>>   id (String) = 2
>>>>>   descriptio (String) = this is
>>>>> my string
>>>>>
>>>>>
>>>>> OGRFeature(out):3
>>>>>   id (String) = 3
>>>>>   descriptio (String) = This is my third row
>>>>>
>>>>> But in your example using /vsistdout/ and -lco CREATE_CSVT=YES is
>>>>> going to result in an invalid CSV file which will mix both the .csvt and
>>>>> .csv content
>>>>>
>>>>> Even
>>>>> Le 24/04/2023 à 13:34, Moises Calzado via gdal-dev a écrit :
>>>>>
>>>>> Hello!
>>>>>
>>>>> We're trying to convert a Shapefile into a CSV using ogr2ogr and we're
>>>>> having some issues while dealing with some columns that contain line breaks
>>>>> inside their values. If we have a line with the following string, ogr2ogr
>>>>> detects that the line break is a new line and it returns two lines.
>>>>>
>>>>> "this is my \n value"
>>>>>
>>>>>
>>>>> That's the command that we're executing:
>>>>>
>>>>> ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/
>>>>>> /vsizip/shapefile.zip -simplify 0.00001 -dim XY -t_srs EPSG:4326 -lco
>>>>>> GEOMETRY=AS_WKT -lco GEOMETRY_NAME=geom -lco CREATE_CSVT=YES > result.csv
>>>>>>
>>>>>
>>>>> Is this an expected behaviour, or is there any way to avoid this?
>>>>> Sharing an example Shapefile so that you can try to reproduce that
>>>>> behaviour:
>>>>> https://drive.google.com/file/d/1gFqfTP02KTFoavJyyO-Ix05YwZB2tS24/view?usp=sharing
>>>>>
>>>>> Thanks so much in advance,
>>>>> Regards.
>>>>>
>>>>> --
>>>>> *Moises Calzado*
>>>>>
>>>>> Support Engineer
>>>>>
>>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>>
>>>>> _______________________________________________
>>>>> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>
>>>>> -- http://www.spatialys.com
>>>>> My software is free, but my time generally not.
>>>>>
>>>>>
>>>>
>>>> --
>>>> *Moises Calzado*
>>>>
>>>> Support Engineer
>>>>
>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>
>>>> -- http://www.spatialys.com
>>>> My software is free, but my time generally not.
>>>>
>>>>
>>>
>>> --
>>> *Moises Calzado*
>>>
>>> Support Engineer
>>>
>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>> <https://spatial-data-science-conference.com/2023/london/>
>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>
>
> --
> *Moises Calzado*
>
> Support Engineer
>
> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
> <https://spatial-data-science-conference.com/2023/london/>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230503/c4624796/attachment-0001.htm>


More information about the gdal-dev mailing list