[gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns

Robert Hewlett rob.hewy at gmail.com
Wed May 3 12:04:55 PDT 2023


Hi,

I just tested with : GDAL 3.6.4, released 2023/04/17

Using the ogr2ogr as follows:
ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES
I get three files but no geometry

ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco GEOMETRY=AS_WKT
I get three file with the geometry as WKT with the column name WKT

*WKT*,id,poi_name,poi_types
"POINT (508878.602179846 5433913.2763688)","1",crescent,"4"
"POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional Park,"1"

ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco
GEOMETRY=AS_WKT -lco GEOMETRY_NAME=*geom*
I get three file with the geometry as WKT but the column called  *geom*
*geom*,id,poi_name,poi_types
"POINT (508878.602179846 5433913.2763688)","1",crescent,"4"
"POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional Park,"1"

What does
*ogr2ogr --version *
report back



On Wed, May 3, 2023 at 9:38 AM Robert Hewlett <rob.hewy at gmail.com> wrote:

> Hi,
>
> Not to start a controversy but it feels like the standard hints at three
> files. Did the standard change?
>
> If it is three files which works for me in QGIS and geopandas i.e. data
> lands where it is suppose to, then more layer creations options are needed
> to handle the SRID/CRS
>
> CREATE_PRJ=YES/NO
> or -t_srs and/or -s_srs triggers the dot-prj file being created.
>
> Just saying 😊.
>
> In the meantime would a short python script help parse the one file into
> three?
>
>
> On Wed, May 3, 2023 at 9:16 AM Moises Calzado via gdal-dev <
> gdal-dev at lists.osgeo.org> wrote:
>
>> Hi Robert,
>>
>> Yes, we're getting one with all the info!
>>
>> El mié, 3 may 2023 a las 18:14, Robert Hewlett (<rob.hewy at gmail.com>)
>> escribió:
>>
>>> Just to clarify, instead of getting three files you are getting one with
>>> all the info: types, projection, data?
>>>
>>> https://giswiki.hsr.ch/GeoCSV
>>>
>>> On Wed, May 3, 2023 at 8:57 AM Moises Calzado via gdal-dev <
>>> gdal-dev at lists.osgeo.org> wrote:
>>>
>>>> We're also specifying the GEOM_POSSIBLE_NAMES, so it would be great if
>>>> with that option we could use the GEOMETRY_NAME without using the
>>>> CREATE_CSVT=YES option.
>>>>
>>>> Regarding emitting the .prj and .csvt in /vsistdout mode, that's why
>>>> I'm saying that there is an issue while generating the resultant CSV.
>>>> The way we see it is that when using the /vsistdout mode, the result is
>>>> a CSV file with the .prj information in the first line, and the .csvt in
>>>> the second line. We're dealing with the result deleting the first two lines
>>>> and using the rest of the content as a CSV, which should be equal to the
>>>> result obtained when using ogr2ogr without the CREATE_CSVT=YES option.
>>>> Probably we're losing something, but as we see it, the generated CSV
>>>> should be a valid one. Does that make sense?
>>>>
>>>> Thanks so much for your help!
>>>>
>>>> El mié, 3 may 2023 a las 15:10, Robert Hewlett (<rob.hewy at gmail.com>)
>>>> escribió:
>>>>
>>>>> The .CSVT and .PRJ help to make a proper geocsv dataset. Helps with
>>>>> QGIS And geopandas. The column name that I use in the CSV is usually geom
>>>>> and WKT shows up in the CSVT file which seems to be a one line file that
>>>>> hints at the data types in the CSV file.
>>>>>
>>>>> I hope that makes sense.
>>>>>
>>>>> CSVT
>>>>> Integer, Integer,WKT
>>>>>
>>>>> CSV
>>>>> line_id,point_id,geom
>>>>> 1,1,"POINT(1000 1000)"
>>>>>
>>>>> PRJ
>>>>> EPSG:26910
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Wed, May 3, 2023, 05:23 Moises Calzado via gdal-dev <
>>>>> gdal-dev at lists.osgeo.org> wrote:
>>>>>
>>>>>> Hi Even,
>>>>>>
>>>>>> Thanks so much for taking a look into that one!
>>>>>>
>>>>>> I have one doubt regarding the CSVT content, as we're not really
>>>>>> using it, but it's required when using the GEOMETRY_NAME layer creation
>>>>>> option, as can be checked in the CSV driver documentation:
>>>>>>
>>>>>>
>>>>>>>    -
>>>>>>>
>>>>>>>    GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>>>>>>>    column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>>>>>>>
>>>>>>> We really need this flag as we are processing files that contain
>>>>>> geometries with different column names, and we always want the same
>>>>>> geometry name in the generated output. Are we losing something when using
>>>>>> that flag to avoid this problem?
>>>>>> In my humble opinion, generating an invalid CSV when using the -lco
>>>>>> CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
>>>>>> strings containing line breaks can't be quoted.
>>>>>>
>>>>>> Could you please shed some light on this?
>>>>>>
>>>>>> Looking forward to your reply,
>>>>>> Regards.
>>>>>>
>>>>>> El mié, 3 may 2023 a las 14:00, Even Rouault (<
>>>>>> even.rouault at spatialys.com>) escribió:
>>>>>>
>>>>>>> you didn't post to the list
>>>>>>> Le 03/05/2023 à 13:49, Moises Calzado a écrit :
>>>>>>>
>>>>>>> Hi Even,
>>>>>>>
>>>>>>> Thanks so much for taking a look into that one!
>>>>>>>
>>>>>>> I have one doubt regarding the CSVT content, as we're not really
>>>>>>> using it, but it's required when using the GEOMETRY_NAME layer creation
>>>>>>> option, as can be checked in the CSV driver documentation:
>>>>>>>
>>>>>>>
>>>>>>>>    -
>>>>>>>>
>>>>>>>>    GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>>>>>>>>    column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>>>>>>>>
>>>>>>>> We really need this flag as we are processing files that contain
>>>>>>> geometries with different column names, and we always want the same
>>>>>>> geometry name in the generated output. Are we losing something when using
>>>>>>> that flag to avoid this problem?
>>>>>>> In my humble opinion, generating an invalid CSV when using the -lco
>>>>>>> CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
>>>>>>> strings containing line breaks can't be quoted.
>>>>>>>
>>>>>>> Could you please shed some light on this?
>>>>>>>
>>>>>>> Looking forward to your reply,
>>>>>>> Regards.
>>>>>>>
>>>>>>> El sáb, 29 abr 2023 a las 15:44, Even Rouault (<
>>>>>>> even.rouault at spatialys.com>) escribió:
>>>>>>>
>>>>>>>> Moises,
>>>>>>>>
>>>>>>>> as far as I can see with your example, the CSV driver behaves
>>>>>>>> "properly" in reading and writing of field values with line breaks.
>>>>>>>>
>>>>>>>> It follows the "Fields with embedded line breaks must be quoted"
>>>>>>>> rule of https://en.wikipedia.org/wiki/Comma-separated_values
>>>>>>>>
>>>>>>>> $ ogr2ogr out.csv /vsizip/dataframe.zip
>>>>>>>>
>>>>>>>> $ cat out.csv
>>>>>>>> id,descriptio
>>>>>>>> "1",This is my third row
>>>>>>>> "2","this is
>>>>>>>> my string
>>>>>>>> "
>>>>>>>> "3",This is my third row
>>>>>>>>
>>>>>>>> $ ogrinfo out.csv -al
>>>>>>>> INFO: Open of `out.csv'
>>>>>>>>       using driver `CSV' successful.
>>>>>>>>
>>>>>>>> Layer name: out
>>>>>>>> Geometry: None
>>>>>>>> Feature Count: 3
>>>>>>>> Layer SRS WKT:
>>>>>>>> (unknown)
>>>>>>>> id: String (0.0)
>>>>>>>> descriptio: String (0.0)
>>>>>>>> OGRFeature(out):1
>>>>>>>>   id (String) = 1
>>>>>>>>   descriptio (String) = This is my third row
>>>>>>>>
>>>>>>>> OGRFeature(out):2
>>>>>>>>   id (String) = 2
>>>>>>>>   descriptio (String) = this is
>>>>>>>> my string
>>>>>>>>
>>>>>>>>
>>>>>>>> OGRFeature(out):3
>>>>>>>>   id (String) = 3
>>>>>>>>   descriptio (String) = This is my third row
>>>>>>>>
>>>>>>>> But in your example using /vsistdout/ and -lco CREATE_CSVT=YES is
>>>>>>>> going to result in an invalid CSV file which will mix both the .csvt and
>>>>>>>> .csv content
>>>>>>>>
>>>>>>>> Even
>>>>>>>> Le 24/04/2023 à 13:34, Moises Calzado via gdal-dev a écrit :
>>>>>>>>
>>>>>>>> Hello!
>>>>>>>>
>>>>>>>> We're trying to convert a Shapefile into a CSV using ogr2ogr and
>>>>>>>> we're having some issues while dealing with some columns that contain line
>>>>>>>> breaks inside their values. If we have a line with the following string,
>>>>>>>> ogr2ogr detects that the line break is a new line and it returns two lines.
>>>>>>>>
>>>>>>>> "this is my \n value"
>>>>>>>>
>>>>>>>>
>>>>>>>> That's the command that we're executing:
>>>>>>>>
>>>>>>>> ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/
>>>>>>>>> /vsizip/shapefile.zip -simplify 0.00001 -dim XY -t_srs EPSG:4326 -lco
>>>>>>>>> GEOMETRY=AS_WKT -lco GEOMETRY_NAME=geom -lco CREATE_CSVT=YES > result.csv
>>>>>>>>>
>>>>>>>>
>>>>>>>> Is this an expected behaviour, or is there any way to avoid this?
>>>>>>>> Sharing an example Shapefile so that you can try to reproduce that
>>>>>>>> behaviour:
>>>>>>>> https://drive.google.com/file/d/1gFqfTP02KTFoavJyyO-Ix05YwZB2tS24/view?usp=sharing
>>>>>>>>
>>>>>>>> Thanks so much in advance,
>>>>>>>> Regards.
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Moises Calzado*
>>>>>>>>
>>>>>>>> Support Engineer
>>>>>>>>
>>>>>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>>>>
>>>>>>>> -- http://www.spatialys.com
>>>>>>>> My software is free, but my time generally not.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> *Moises Calzado*
>>>>>>>
>>>>>>> Support Engineer
>>>>>>>
>>>>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>>>>
>>>>>>> -- http://www.spatialys.com
>>>>>>> My software is free, but my time generally not.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Moises Calzado*
>>>>>>
>>>>>> Support Engineer
>>>>>>
>>>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>>> _______________________________________________
>>>>>> gdal-dev mailing list
>>>>>> gdal-dev at lists.osgeo.org
>>>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>>
>>>>> _______________________________________________
>>>>> gdal-dev mailing list
>>>>> gdal-dev at lists.osgeo.org
>>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>
>>>>
>>>>
>>>> --
>>>> *Moises Calzado*
>>>>
>>>> Support Engineer
>>>>
>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>> _______________________________________________
>>>> gdal-dev mailing list
>>>> gdal-dev at lists.osgeo.org
>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>
>>> _______________________________________________
>>> gdal-dev mailing list
>>> gdal-dev at lists.osgeo.org
>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>
>>
>>
>> --
>> *Moises Calzado*
>>
>> Support Engineer
>>
>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>> <https://spatial-data-science-conference.com/2023/london/>
>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230503/f566b7d8/attachment-0001.htm>


More information about the gdal-dev mailing list