[gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns

Even Rouault even.rouault at spatialys.com
Wed May 3 05:33:42 PDT 2023


Le 03/05/2023 à 14:22, Moises Calzado via gdal-dev a écrit :
> Hi Even,
>
> Thanks so much for taking a look into that one!
>
> I have one doubt regarding the CSVT content, as we're not really using 
> it, but it's required when using the GEOMETRY_NAME layer creation 
> option, as can be checked in the CSV driver documentation:
>
>      *
>
>         GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>         column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES.
>         Defaults to WKT
>
> We really need this flag as we are processing files that contain 
> geometries with different column names, and we always want the same 
> geometry name in the generated output. Are we losing something when 
> using that flag to avoid this problem?

The reason  for requiring CREASE_CSVT=YES is that when reading back a 
.csv without a .csvt the geometry column must be named WKT. Unless you 
specify the GEOM_POSSIBLE_NAMES open option (which must have been a 
later addition). That said it could be reasonable to relax that coupling 
and allow GEOMETRY_NAME without CREATE_CSVT=YES, with a warning in the 
doc about the consequence I just mentioned before

> In my humble opinion, generating an invalid CSV when using the -lco 
> CREATE_CSVT=YES looks like a bug for me,

Are you speaking about emitting the .prj and .csvt content when writing 
to /vsistdout ? Yes, I'd tend to agree they should not be emitted in 
that mode.

> as I can't see the reason why strings containing line breaks can't be 
> quoted.
I'm not following you about the issue with line breaks. In my previous 
message, I showed I didn't reproduce any issue: the CSV driver emits 
fields with double quotes, even when there are line breaks. Can you be 
more specific about what's wrong ? I don't see the connection with 
GEOMETRY_NAME.
>
> Could you please shed some light on this?
>
> Looking forward to your reply,
> Regards.
>
> El mié, 3 may 2023 a las 14:00, Even Rouault 
> (<even.rouault at spatialys.com>) escribió:
>
>     you didn't post to the list
>
>     Le 03/05/2023 à 13:49, Moises Calzado a écrit :
>>     Hi Even,
>>
>>     Thanks so much for taking a look into that one!
>>
>>     I have one doubt regarding the CSVT content, as we're not really
>>     using it, but it's required when using the GEOMETRY_NAME layer
>>     creation option, as can be checked in the CSV driver documentation:
>>
>>          *
>>
>>             GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of
>>             geometry column. Only used if GEOMETRY=AS_WKT and
>>             CREATE_CSVT=YES. Defaults to WKT
>>
>>     We really need this flag as we are processing files that contain
>>     geometries with different column names, and we always want the
>>     same geometry name in the generated output. Are we losing
>>     something when using that flag to avoid this problem?
>>     In my humble opinion, generating an invalid CSV when using the
>>     -lco CREATE_CSVT=YES looks like a bug for me, as I can't see the
>>     reason why strings containing line breaks can't be quoted.
>>
>>     Could you please shed some light on this?
>>
>>     Looking forward to your reply,
>>     Regards.
>>
>>     El sáb, 29 abr 2023 a las 15:44, Even Rouault
>>     (<even.rouault at spatialys.com>) escribió:
>>
>>         Moises,
>>
>>         as far as I can see with your example, the CSV driver behaves
>>         "properly" in reading and writing of field values with line
>>         breaks.
>>
>>         It follows the "Fields with embedded line breaks must be
>>         quoted" rule of
>>         https://en.wikipedia.org/wiki/Comma-separated_values
>>
>>         $ ogr2ogr out.csv /vsizip/dataframe.zip
>>
>>         $ cat out.csv
>>         id,descriptio
>>         "1",This is my third row
>>         "2","this is
>>         my string
>>         "
>>         "3",This is my third row
>>
>>         $ ogrinfo out.csv -al
>>         INFO: Open of `out.csv'
>>               using driver `CSV' successful.
>>
>>         Layer name: out
>>         Geometry: None
>>         Feature Count: 3
>>         Layer SRS WKT:
>>         (unknown)
>>         id: String (0.0)
>>         descriptio: String (0.0)
>>         OGRFeature(out):1
>>           id (String) = 1
>>           descriptio (String) = This is my third row
>>
>>         OGRFeature(out):2
>>           id (String) = 2
>>           descriptio (String) = this is
>>         my string
>>
>>
>>         OGRFeature(out):3
>>           id (String) = 3
>>           descriptio (String) = This is my third row
>>
>>         But in your example using /vsistdout/ and -lco
>>         CREATE_CSVT=YES is going to result in an invalid CSV file
>>         which will mix both the .csvt and .csv content
>>
>>         Even
>>
>>         Le 24/04/2023 à 13:34, Moises Calzado via gdal-dev a écrit :
>>>         Hello!
>>>
>>>         We're trying to convert a Shapefile into a CSV using ogr2ogr
>>>         and we're having some issues while dealing with some columns
>>>         that contain line breaks inside their values. If we have a
>>>         line with the following string, ogr2ogr detects that the
>>>         line break is a new line and it returns two lines.
>>>
>>>             "this is my \n value"
>>>
>>>
>>>         That's the command that we're executing:
>>>
>>>             ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/
>>>             /vsizip/shapefile.zip -simplify 0.00001 -dim XY -t_srs
>>>             EPSG:4326 -lco GEOMETRY=AS_WKT -lco GEOMETRY_NAME=geom
>>>             -lco CREATE_CSVT=YES > result.csv
>>>
>>>
>>>         Is this an expected behaviour, or is there any way to avoid
>>>         this?
>>>         Sharing an example Shapefile so that you can try to
>>>         reproduce that behaviour:
>>>         https://drive.google.com/file/d/1gFqfTP02KTFoavJyyO-Ix05YwZB2tS24/view?usp=sharing
>>>
>>>         Thanks so much in advance,
>>>         Regards.
>>>
>>>         -- 
>>>         *Moises Calzado*
>>>
>>>         Support Engineer
>>>
>>>         +34671264286 | mcalzado at carto.com | CARTO
>>>         <https://www.carto.com/>
>>>
>>>         <https://spatial-data-science-conference.com/2023/london/>
>>>
>>>         _______________________________________________
>>>         gdal-dev mailing list
>>>         gdal-dev at lists.osgeo.org
>>>         https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>>         -- 
>>         http://www.spatialys.com
>>         My software is free, but my time generally not.
>>
>>
>>
>>     -- 
>>     *Moises Calzado*
>>
>>     Support Engineer
>>
>>     +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>
>>     <https://spatial-data-science-conference.com/2023/london/>
>
>     -- 
>     http://www.spatialys.com
>     My software is free, but my time generally not.
>
>
>
> -- 
> *Moises Calzado*
>
> Support Engineer
>
> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>
> <https://spatial-data-science-conference.com/2023/london/>
>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev

-- 
http://www.spatialys.com
My software is free, but my time generally not.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230503/43fdbc32/attachment.htm>


More information about the gdal-dev mailing list