[gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns

Robert Hewlett rob.hewy at gmail.com
Wed May 3 06:10:39 PDT 2023


The .CSVT and .PRJ help to make a proper geocsv dataset. Helps with QGIS
And geopandas. The column name that I use in the CSV is usually geom and
WKT shows up in the CSVT file which seems to be a one line file that hints
at the data types in the CSV file.

I hope that makes sense.

CSVT
Integer, Integer,WKT

CSV
line_id,point_id,geom
1,1,"POINT(1000 1000)"

PRJ
EPSG:26910




On Wed, May 3, 2023, 05:23 Moises Calzado via gdal-dev <
gdal-dev at lists.osgeo.org> wrote:

> Hi Even,
>
> Thanks so much for taking a look into that one!
>
> I have one doubt regarding the CSVT content, as we're not really using it,
> but it's required when using the GEOMETRY_NAME layer creation option, as
> can be checked in the CSV driver documentation:
>
>
>>    -
>>
>>    GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry column.
>>    Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>>
>> We really need this flag as we are processing files that contain
> geometries with different column names, and we always want the same
> geometry name in the generated output. Are we losing something when using
> that flag to avoid this problem?
> In my humble opinion, generating an invalid CSV when using the -lco
> CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
> strings containing line breaks can't be quoted.
>
> Could you please shed some light on this?
>
> Looking forward to your reply,
> Regards.
>
> El mié, 3 may 2023 a las 14:00, Even Rouault (<even.rouault at spatialys.com>)
> escribió:
>
>> you didn't post to the list
>> Le 03/05/2023 à 13:49, Moises Calzado a écrit :
>>
>> Hi Even,
>>
>> Thanks so much for taking a look into that one!
>>
>> I have one doubt regarding the CSVT content, as we're not really using
>> it, but it's required when using the GEOMETRY_NAME layer creation option,
>> as can be checked in the CSV driver documentation:
>>
>>
>>>    -
>>>
>>>    GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>>>    column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>>>
>>> We really need this flag as we are processing files that contain
>> geometries with different column names, and we always want the same
>> geometry name in the generated output. Are we losing something when using
>> that flag to avoid this problem?
>> In my humble opinion, generating an invalid CSV when using the -lco
>> CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
>> strings containing line breaks can't be quoted.
>>
>> Could you please shed some light on this?
>>
>> Looking forward to your reply,
>> Regards.
>>
>> El sáb, 29 abr 2023 a las 15:44, Even Rouault (<
>> even.rouault at spatialys.com>) escribió:
>>
>>> Moises,
>>>
>>> as far as I can see with your example, the CSV driver behaves "properly"
>>> in reading and writing of field values with line breaks.
>>>
>>> It follows the "Fields with embedded line breaks must be quoted" rule of
>>> https://en.wikipedia.org/wiki/Comma-separated_values
>>>
>>> $ ogr2ogr out.csv /vsizip/dataframe.zip
>>>
>>> $ cat out.csv
>>> id,descriptio
>>> "1",This is my third row
>>> "2","this is
>>> my string
>>> "
>>> "3",This is my third row
>>>
>>> $ ogrinfo out.csv -al
>>> INFO: Open of `out.csv'
>>>       using driver `CSV' successful.
>>>
>>> Layer name: out
>>> Geometry: None
>>> Feature Count: 3
>>> Layer SRS WKT:
>>> (unknown)
>>> id: String (0.0)
>>> descriptio: String (0.0)
>>> OGRFeature(out):1
>>>   id (String) = 1
>>>   descriptio (String) = This is my third row
>>>
>>> OGRFeature(out):2
>>>   id (String) = 2
>>>   descriptio (String) = this is
>>> my string
>>>
>>>
>>> OGRFeature(out):3
>>>   id (String) = 3
>>>   descriptio (String) = This is my third row
>>>
>>> But in your example using /vsistdout/ and -lco CREATE_CSVT=YES is going
>>> to result in an invalid CSV file which will mix both the .csvt and .csv
>>> content
>>>
>>> Even
>>> Le 24/04/2023 à 13:34, Moises Calzado via gdal-dev a écrit :
>>>
>>> Hello!
>>>
>>> We're trying to convert a Shapefile into a CSV using ogr2ogr and we're
>>> having some issues while dealing with some columns that contain line breaks
>>> inside their values. If we have a line with the following string, ogr2ogr
>>> detects that the line break is a new line and it returns two lines.
>>>
>>> "this is my \n value"
>>>
>>>
>>> That's the command that we're executing:
>>>
>>> ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/
>>>> /vsizip/shapefile.zip -simplify 0.00001 -dim XY -t_srs EPSG:4326 -lco
>>>> GEOMETRY=AS_WKT -lco GEOMETRY_NAME=geom -lco CREATE_CSVT=YES > result.csv
>>>>
>>>
>>> Is this an expected behaviour, or is there any way to avoid this?
>>> Sharing an example Shapefile so that you can try to reproduce that
>>> behaviour:
>>> https://drive.google.com/file/d/1gFqfTP02KTFoavJyyO-Ix05YwZB2tS24/view?usp=sharing
>>>
>>> Thanks so much in advance,
>>> Regards.
>>>
>>> --
>>> *Moises Calzado*
>>>
>>> Support Engineer
>>>
>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>> <https://spatial-data-science-conference.com/2023/london/>
>>>
>>> _______________________________________________
>>> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>
>>> -- http://www.spatialys.com
>>> My software is free, but my time generally not.
>>>
>>>
>>
>> --
>> *Moises Calzado*
>>
>> Support Engineer
>>
>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>> <https://spatial-data-science-conference.com/2023/london/>
>>
>> -- http://www.spatialys.com
>> My software is free, but my time generally not.
>>
>>
>
> --
> *Moises Calzado*
>
> Support Engineer
>
> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
> <https://spatial-data-science-conference.com/2023/london/>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230503/b0280c48/attachment-0001.htm>


More information about the gdal-dev mailing list