[gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns
Even Rouault
even.rouault at spatialys.com
Wed May 3 05:33:42 PDT 2023
Le 03/05/2023 à 14:22, Moises Calzado via gdal-dev a écrit :
> Hi Even,
>
> Thanks so much for taking a look into that one!
>
> I have one doubt regarding the CSVT content, as we're not really using
> it, but it's required when using the GEOMETRY_NAME layer creation
> option, as can be checked in the CSV driver documentation:
>
> *
>
> GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
> column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES.
> Defaults to WKT
>
> We really need this flag as we are processing files that contain
> geometries with different column names, and we always want the same
> geometry name in the generated output. Are we losing something when
> using that flag to avoid this problem?
The reason for requiring CREASE_CSVT=YES is that when reading back a
.csv without a .csvt the geometry column must be named WKT. Unless you
specify the GEOM_POSSIBLE_NAMES open option (which must have been a
later addition). That said it could be reasonable to relax that coupling
and allow GEOMETRY_NAME without CREATE_CSVT=YES, with a warning in the
doc about the consequence I just mentioned before
> In my humble opinion, generating an invalid CSV when using the -lco
> CREATE_CSVT=YES looks like a bug for me,
Are you speaking about emitting the .prj and .csvt content when writing
to /vsistdout ? Yes, I'd tend to agree they should not be emitted in
that mode.
> as I can't see the reason why strings containing line breaks can't be
> quoted.
I'm not following you about the issue with line breaks. In my previous
message, I showed I didn't reproduce any issue: the CSV driver emits
fields with double quotes, even when there are line breaks. Can you be
more specific about what's wrong ? I don't see the connection with
GEOMETRY_NAME.
>
> Could you please shed some light on this?
>
> Looking forward to your reply,
> Regards.
>
> El mié, 3 may 2023 a las 14:00, Even Rouault
> (<even.rouault at spatialys.com>) escribió:
>
> you didn't post to the list
>
> Le 03/05/2023 à 13:49, Moises Calzado a écrit :
>> Hi Even,
>>
>> Thanks so much for taking a look into that one!
>>
>> I have one doubt regarding the CSVT content, as we're not really
>> using it, but it's required when using the GEOMETRY_NAME layer
>> creation option, as can be checked in the CSV driver documentation:
>>
>> *
>>
>> GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of
>> geometry column. Only used if GEOMETRY=AS_WKT and
>> CREATE_CSVT=YES. Defaults to WKT
>>
>> We really need this flag as we are processing files that contain
>> geometries with different column names, and we always want the
>> same geometry name in the generated output. Are we losing
>> something when using that flag to avoid this problem?
>> In my humble opinion, generating an invalid CSV when using the
>> -lco CREATE_CSVT=YES looks like a bug for me, as I can't see the
>> reason why strings containing line breaks can't be quoted.
>>
>> Could you please shed some light on this?
>>
>> Looking forward to your reply,
>> Regards.
>>
>> El sáb, 29 abr 2023 a las 15:44, Even Rouault
>> (<even.rouault at spatialys.com>) escribió:
>>
>> Moises,
>>
>> as far as I can see with your example, the CSV driver behaves
>> "properly" in reading and writing of field values with line
>> breaks.
>>
>> It follows the "Fields with embedded line breaks must be
>> quoted" rule of
>> https://en.wikipedia.org/wiki/Comma-separated_values
>>
>> $ ogr2ogr out.csv /vsizip/dataframe.zip
>>
>> $ cat out.csv
>> id,descriptio
>> "1",This is my third row
>> "2","this is
>> my string
>> "
>> "3",This is my third row
>>
>> $ ogrinfo out.csv -al
>> INFO: Open of `out.csv'
>> using driver `CSV' successful.
>>
>> Layer name: out
>> Geometry: None
>> Feature Count: 3
>> Layer SRS WKT:
>> (unknown)
>> id: String (0.0)
>> descriptio: String (0.0)
>> OGRFeature(out):1
>> id (String) = 1
>> descriptio (String) = This is my third row
>>
>> OGRFeature(out):2
>> id (String) = 2
>> descriptio (String) = this is
>> my string
>>
>>
>> OGRFeature(out):3
>> id (String) = 3
>> descriptio (String) = This is my third row
>>
>> But in your example using /vsistdout/ and -lco
>> CREATE_CSVT=YES is going to result in an invalid CSV file
>> which will mix both the .csvt and .csv content
>>
>> Even
>>
>> Le 24/04/2023 à 13:34, Moises Calzado via gdal-dev a écrit :
>>> Hello!
>>>
>>> We're trying to convert a Shapefile into a CSV using ogr2ogr
>>> and we're having some issues while dealing with some columns
>>> that contain line breaks inside their values. If we have a
>>> line with the following string, ogr2ogr detects that the
>>> line break is a new line and it returns two lines.
>>>
>>> "this is my \n value"
>>>
>>>
>>> That's the command that we're executing:
>>>
>>> ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/
>>> /vsizip/shapefile.zip -simplify 0.00001 -dim XY -t_srs
>>> EPSG:4326 -lco GEOMETRY=AS_WKT -lco GEOMETRY_NAME=geom
>>> -lco CREATE_CSVT=YES > result.csv
>>>
>>>
>>> Is this an expected behaviour, or is there any way to avoid
>>> this?
>>> Sharing an example Shapefile so that you can try to
>>> reproduce that behaviour:
>>> https://drive.google.com/file/d/1gFqfTP02KTFoavJyyO-Ix05YwZB2tS24/view?usp=sharing
>>>
>>> Thanks so much in advance,
>>> Regards.
>>>
>>> --
>>> *Moises Calzado*
>>>
>>> Support Engineer
>>>
>>> +34671264286 | mcalzado at carto.com | CARTO
>>> <https://www.carto.com/>
>>>
>>> <https://spatial-data-science-conference.com/2023/london/>
>>>
>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>> --
>> http://www.spatialys.com
>> My software is free, but my time generally not.
>>
>>
>>
>> --
>> *Moises Calzado*
>>
>> Support Engineer
>>
>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>
>> <https://spatial-data-science-conference.com/2023/london/>
>
> --
> http://www.spatialys.com
> My software is free, but my time generally not.
>
>
>
> --
> *Moises Calzado*
>
> Support Engineer
>
> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>
> <https://spatial-data-science-conference.com/2023/london/>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230503/43fdbc32/attachment.htm>
More information about the gdal-dev
mailing list