[gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns

Moises Calzado mcalzado at carto.com
Wed May 3 05:22:48 PDT 2023


Hi Even,

Thanks so much for taking a look into that one!

I have one doubt regarding the CSVT content, as we're not really using it,
but it's required when using the GEOMETRY_NAME layer creation option, as
can be checked in the CSV driver documentation:


>    -
>
>    GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry column.
>    Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>
> We really need this flag as we are processing files that contain
geometries with different column names, and we always want the same
geometry name in the generated output. Are we losing something when using
that flag to avoid this problem?
In my humble opinion, generating an invalid CSV when using the -lco
CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
strings containing line breaks can't be quoted.

Could you please shed some light on this?

Looking forward to your reply,
Regards.

El mié, 3 may 2023 a las 14:00, Even Rouault (<even.rouault at spatialys.com>)
escribió:

> you didn't post to the list
> Le 03/05/2023 à 13:49, Moises Calzado a écrit :
>
> Hi Even,
>
> Thanks so much for taking a look into that one!
>
> I have one doubt regarding the CSVT content, as we're not really using it,
> but it's required when using the GEOMETRY_NAME layer creation option, as
> can be checked in the CSV driver documentation:
>
>
>>    -
>>
>>    GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry column.
>>    Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>>
>> We really need this flag as we are processing files that contain
> geometries with different column names, and we always want the same
> geometry name in the generated output. Are we losing something when using
> that flag to avoid this problem?
> In my humble opinion, generating an invalid CSV when using the -lco
> CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
> strings containing line breaks can't be quoted.
>
> Could you please shed some light on this?
>
> Looking forward to your reply,
> Regards.
>
> El sáb, 29 abr 2023 a las 15:44, Even Rouault (<even.rouault at spatialys.com>)
> escribió:
>
>> Moises,
>>
>> as far as I can see with your example, the CSV driver behaves "properly"
>> in reading and writing of field values with line breaks.
>>
>> It follows the "Fields with embedded line breaks must be quoted" rule of
>> https://en.wikipedia.org/wiki/Comma-separated_values
>>
>> $ ogr2ogr out.csv /vsizip/dataframe.zip
>>
>> $ cat out.csv
>> id,descriptio
>> "1",This is my third row
>> "2","this is
>> my string
>> "
>> "3",This is my third row
>>
>> $ ogrinfo out.csv -al
>> INFO: Open of `out.csv'
>>       using driver `CSV' successful.
>>
>> Layer name: out
>> Geometry: None
>> Feature Count: 3
>> Layer SRS WKT:
>> (unknown)
>> id: String (0.0)
>> descriptio: String (0.0)
>> OGRFeature(out):1
>>   id (String) = 1
>>   descriptio (String) = This is my third row
>>
>> OGRFeature(out):2
>>   id (String) = 2
>>   descriptio (String) = this is
>> my string
>>
>>
>> OGRFeature(out):3
>>   id (String) = 3
>>   descriptio (String) = This is my third row
>>
>> But in your example using /vsistdout/ and -lco CREATE_CSVT=YES is going
>> to result in an invalid CSV file which will mix both the .csvt and .csv
>> content
>>
>> Even
>> Le 24/04/2023 à 13:34, Moises Calzado via gdal-dev a écrit :
>>
>> Hello!
>>
>> We're trying to convert a Shapefile into a CSV using ogr2ogr and we're
>> having some issues while dealing with some columns that contain line breaks
>> inside their values. If we have a line with the following string, ogr2ogr
>> detects that the line break is a new line and it returns two lines.
>>
>> "this is my \n value"
>>
>>
>> That's the command that we're executing:
>>
>> ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/ /vsizip/shapefile.zip
>>> -simplify 0.00001 -dim XY -t_srs EPSG:4326 -lco GEOMETRY=AS_WKT -lco
>>> GEOMETRY_NAME=geom -lco CREATE_CSVT=YES > result.csv
>>>
>>
>> Is this an expected behaviour, or is there any way to avoid this?
>> Sharing an example Shapefile so that you can try to reproduce that
>> behaviour:
>> https://drive.google.com/file/d/1gFqfTP02KTFoavJyyO-Ix05YwZB2tS24/view?usp=sharing
>>
>> Thanks so much in advance,
>> Regards.
>>
>> --
>> *Moises Calzado*
>>
>> Support Engineer
>>
>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>> <https://spatial-data-science-conference.com/2023/london/>
>>
>> _______________________________________________
>> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>> -- http://www.spatialys.com
>> My software is free, but my time generally not.
>>
>>
>
> --
> *Moises Calzado*
>
> Support Engineer
>
> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
> <https://spatial-data-science-conference.com/2023/london/>
>
> -- http://www.spatialys.com
> My software is free, but my time generally not.
>
>

-- 
*Moises Calzado*

Support Engineer

+34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
<https://spatial-data-science-conference.com/2023/london/>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230503/511ad1e3/attachment-0001.htm>


More information about the gdal-dev mailing list