[gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns

Moises Calzado mcalzado at carto.com
Wed May 3 09:15:43 PDT 2023


Hi Robert,

Yes, we're getting one with all the info!

El mié, 3 may 2023 a las 18:14, Robert Hewlett (<rob.hewy at gmail.com>)
escribió:

> Just to clarify, instead of getting three files you are getting one with
> all the info: types, projection, data?
>
> https://giswiki.hsr.ch/GeoCSV
>
> On Wed, May 3, 2023 at 8:57 AM Moises Calzado via gdal-dev <
> gdal-dev at lists.osgeo.org> wrote:
>
>> We're also specifying the GEOM_POSSIBLE_NAMES, so it would be great if
>> with that option we could use the GEOMETRY_NAME without using the
>> CREATE_CSVT=YES option.
>>
>> Regarding emitting the .prj and .csvt in /vsistdout mode, that's why I'm
>> saying that there is an issue while generating the resultant CSV.
>> The way we see it is that when using the /vsistdout mode, the result is a
>> CSV file with the .prj information in the first line, and the .csvt in the
>> second line. We're dealing with the result deleting the first two lines and
>> using the rest of the content as a CSV, which should be equal to the result
>> obtained when using ogr2ogr without the CREATE_CSVT=YES option.
>> Probably we're losing something, but as we see it, the generated CSV
>> should be a valid one. Does that make sense?
>>
>> Thanks so much for your help!
>>
>> El mié, 3 may 2023 a las 15:10, Robert Hewlett (<rob.hewy at gmail.com>)
>> escribió:
>>
>>> The .CSVT and .PRJ help to make a proper geocsv dataset. Helps with QGIS
>>> And geopandas. The column name that I use in the CSV is usually geom and
>>> WKT shows up in the CSVT file which seems to be a one line file that hints
>>> at the data types in the CSV file.
>>>
>>> I hope that makes sense.
>>>
>>> CSVT
>>> Integer, Integer,WKT
>>>
>>> CSV
>>> line_id,point_id,geom
>>> 1,1,"POINT(1000 1000)"
>>>
>>> PRJ
>>> EPSG:26910
>>>
>>>
>>>
>>>
>>> On Wed, May 3, 2023, 05:23 Moises Calzado via gdal-dev <
>>> gdal-dev at lists.osgeo.org> wrote:
>>>
>>>> Hi Even,
>>>>
>>>> Thanks so much for taking a look into that one!
>>>>
>>>> I have one doubt regarding the CSVT content, as we're not really using
>>>> it, but it's required when using the GEOMETRY_NAME layer creation option,
>>>> as can be checked in the CSV driver documentation:
>>>>
>>>>
>>>>>    -
>>>>>
>>>>>    GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>>>>>    column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>>>>>
>>>>> We really need this flag as we are processing files that contain
>>>> geometries with different column names, and we always want the same
>>>> geometry name in the generated output. Are we losing something when using
>>>> that flag to avoid this problem?
>>>> In my humble opinion, generating an invalid CSV when using the -lco
>>>> CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
>>>> strings containing line breaks can't be quoted.
>>>>
>>>> Could you please shed some light on this?
>>>>
>>>> Looking forward to your reply,
>>>> Regards.
>>>>
>>>> El mié, 3 may 2023 a las 14:00, Even Rouault (<
>>>> even.rouault at spatialys.com>) escribió:
>>>>
>>>>> you didn't post to the list
>>>>> Le 03/05/2023 à 13:49, Moises Calzado a écrit :
>>>>>
>>>>> Hi Even,
>>>>>
>>>>> Thanks so much for taking a look into that one!
>>>>>
>>>>> I have one doubt regarding the CSVT content, as we're not really using
>>>>> it, but it's required when using the GEOMETRY_NAME layer creation option,
>>>>> as can be checked in the CSV driver documentation:
>>>>>
>>>>>
>>>>>>    -
>>>>>>
>>>>>>    GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>>>>>>    column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>>>>>>
>>>>>> We really need this flag as we are processing files that contain
>>>>> geometries with different column names, and we always want the same
>>>>> geometry name in the generated output. Are we losing something when using
>>>>> that flag to avoid this problem?
>>>>> In my humble opinion, generating an invalid CSV when using the -lco
>>>>> CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
>>>>> strings containing line breaks can't be quoted.
>>>>>
>>>>> Could you please shed some light on this?
>>>>>
>>>>> Looking forward to your reply,
>>>>> Regards.
>>>>>
>>>>> El sáb, 29 abr 2023 a las 15:44, Even Rouault (<
>>>>> even.rouault at spatialys.com>) escribió:
>>>>>
>>>>>> Moises,
>>>>>>
>>>>>> as far as I can see with your example, the CSV driver behaves
>>>>>> "properly" in reading and writing of field values with line breaks.
>>>>>>
>>>>>> It follows the "Fields with embedded line breaks must be quoted" rule
>>>>>> of https://en.wikipedia.org/wiki/Comma-separated_values
>>>>>>
>>>>>> $ ogr2ogr out.csv /vsizip/dataframe.zip
>>>>>>
>>>>>> $ cat out.csv
>>>>>> id,descriptio
>>>>>> "1",This is my third row
>>>>>> "2","this is
>>>>>> my string
>>>>>> "
>>>>>> "3",This is my third row
>>>>>>
>>>>>> $ ogrinfo out.csv -al
>>>>>> INFO: Open of `out.csv'
>>>>>>       using driver `CSV' successful.
>>>>>>
>>>>>> Layer name: out
>>>>>> Geometry: None
>>>>>> Feature Count: 3
>>>>>> Layer SRS WKT:
>>>>>> (unknown)
>>>>>> id: String (0.0)
>>>>>> descriptio: String (0.0)
>>>>>> OGRFeature(out):1
>>>>>>   id (String) = 1
>>>>>>   descriptio (String) = This is my third row
>>>>>>
>>>>>> OGRFeature(out):2
>>>>>>   id (String) = 2
>>>>>>   descriptio (String) = this is
>>>>>> my string
>>>>>>
>>>>>>
>>>>>> OGRFeature(out):3
>>>>>>   id (String) = 3
>>>>>>   descriptio (String) = This is my third row
>>>>>>
>>>>>> But in your example using /vsistdout/ and -lco CREATE_CSVT=YES is
>>>>>> going to result in an invalid CSV file which will mix both the .csvt and
>>>>>> .csv content
>>>>>>
>>>>>> Even
>>>>>> Le 24/04/2023 à 13:34, Moises Calzado via gdal-dev a écrit :
>>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> We're trying to convert a Shapefile into a CSV using ogr2ogr and
>>>>>> we're having some issues while dealing with some columns that contain line
>>>>>> breaks inside their values. If we have a line with the following string,
>>>>>> ogr2ogr detects that the line break is a new line and it returns two lines.
>>>>>>
>>>>>> "this is my \n value"
>>>>>>
>>>>>>
>>>>>> That's the command that we're executing:
>>>>>>
>>>>>> ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/
>>>>>>> /vsizip/shapefile.zip -simplify 0.00001 -dim XY -t_srs EPSG:4326 -lco
>>>>>>> GEOMETRY=AS_WKT -lco GEOMETRY_NAME=geom -lco CREATE_CSVT=YES > result.csv
>>>>>>>
>>>>>>
>>>>>> Is this an expected behaviour, or is there any way to avoid this?
>>>>>> Sharing an example Shapefile so that you can try to reproduce that
>>>>>> behaviour:
>>>>>> https://drive.google.com/file/d/1gFqfTP02KTFoavJyyO-Ix05YwZB2tS24/view?usp=sharing
>>>>>>
>>>>>> Thanks so much in advance,
>>>>>> Regards.
>>>>>>
>>>>>> --
>>>>>> *Moises Calzado*
>>>>>>
>>>>>> Support Engineer
>>>>>>
>>>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>>>
>>>>>> _______________________________________________
>>>>>> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>>
>>>>>> -- http://www.spatialys.com
>>>>>> My software is free, but my time generally not.
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> *Moises Calzado*
>>>>>
>>>>> Support Engineer
>>>>>
>>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>>
>>>>> -- http://www.spatialys.com
>>>>> My software is free, but my time generally not.
>>>>>
>>>>>
>>>>
>>>> --
>>>> *Moises Calzado*
>>>>
>>>> Support Engineer
>>>>
>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>> _______________________________________________
>>>> gdal-dev mailing list
>>>> gdal-dev at lists.osgeo.org
>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>
>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>
>>
>>
>> --
>> *Moises Calzado*
>>
>> Support Engineer
>>
>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>> <https://spatial-data-science-conference.com/2023/london/>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>


-- 
*Moises Calzado*

Support Engineer

+34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
<https://spatial-data-science-conference.com/2023/london/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230503/25f46be7/attachment-0001.htm>


More information about the gdal-dev mailing list