[gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks inside columns
Robert Hewlett
rob.hewy at gmail.com
Thu May 4 06:07:26 PDT 2023
Hi,
Traced back through the thread and found your sample data set and was able
to convert it to a CSV file that both Calc, Excel and QGIS loaded without
issue.
The command I used the three files
ogr2ogr -f CSV df.csv dataframe.shp -lco CREATE_CSVT=YES -lco
GEOMETRY=AS_WKT -lco GEOMETRY_NAME=geom
The redirect > result.csv is potentially why you have one, all out (3
files) put into one file.
The column name in the csv file is geom and not WKT.
I will try to post some images.
The command was run on a windows 10 Laptop.
On Thu, May 4, 2023 at 4:59 AM Moises Calzado via gdal-dev <
gdal-dev at lists.osgeo.org> wrote:
> Hi Robert!
>
> I think that we're losing a bit the main issue that we reported, as in
> fact the problem is related with line breaks in the output generated while
> using /vsistdout and the CREATE_CSVT=YES option.
>
> Even pointed out that avoiding that flag it works as expected, but when
> it's used the generated output is not okay as the "Fields with embedded
> line breaks must be quoted" rule is not followed.
> IMHO although the generated output is not a CSV itself, we should be able
> to delete the first two lines (projection info and types) and deal with the
> rest of the content as a CSV.
>
> What we're doing is streaming the output of the /vsistdout driver to
> another process that perform some steps with the resultant CSV. In all
> cases it works correctly, as the output of the ogr2ogr execution is a valid
> CSV when deleting the first two lines, but in the case reported in my first
> email it's not.
> The CREATE_CSVT=YES option is mandatory for us as for the moment, it's
> requires to use the GEOMETRY_NAME=*geom *one, so we don't have any
> workaround.
>
> Just wanted to confirm if that's expected for you (generating an output
> that it's not a valid CSV in the end)!
>
> El mié, 3 may 2023 a las 21:05, Robert Hewlett (<rob.hewy at gmail.com>)
> escribió:
>
>> Hi,
>>
>> I just tested with : GDAL 3.6.4, released 2023/04/17
>>
>> Using the ogr2ogr as follows:
>> ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES
>> I get three files but no geometry
>>
>> ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco
>> GEOMETRY=AS_WKT
>> I get three file with the geometry as WKT with the column name WKT
>>
>> *WKT*,id,poi_name,poi_types
>> "POINT (508878.602179846 5433913.2763688)","1",crescent,"4"
>> "POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional Park,"1"
>>
>> ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco
>> GEOMETRY=AS_WKT -lco GEOMETRY_NAME=*geom*
>> I get three file with the geometry as WKT but the column called *geom*
>> *geom*,id,poi_name,poi_types
>> "POINT (508878.602179846 5433913.2763688)","1",crescent,"4"
>> "POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional Park,"1"
>>
>> What does
>> *ogr2ogr --version *
>> report back
>>
>>
>>
>> On Wed, May 3, 2023 at 9:38 AM Robert Hewlett <rob.hewy at gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Not to start a controversy but it feels like the standard hints at three
>>> files. Did the standard change?
>>>
>>> If it is three files which works for me in QGIS and geopandas i.e. data
>>> lands where it is suppose to, then more layer creations options are needed
>>> to handle the SRID/CRS
>>>
>>> CREATE_PRJ=YES/NO
>>> or -t_srs and/or -s_srs triggers the dot-prj file being created.
>>>
>>> Just saying 😊.
>>>
>>> In the meantime would a short python script help parse the one file into
>>> three?
>>>
>>>
>>> On Wed, May 3, 2023 at 9:16 AM Moises Calzado via gdal-dev <
>>> gdal-dev at lists.osgeo.org> wrote:
>>>
>>>> Hi Robert,
>>>>
>>>> Yes, we're getting one with all the info!
>>>>
>>>> El mié, 3 may 2023 a las 18:14, Robert Hewlett (<rob.hewy at gmail.com>)
>>>> escribió:
>>>>
>>>>> Just to clarify, instead of getting three files you are getting one
>>>>> with all the info: types, projection, data?
>>>>>
>>>>> https://giswiki.hsr.ch/GeoCSV
>>>>>
>>>>> On Wed, May 3, 2023 at 8:57 AM Moises Calzado via gdal-dev <
>>>>> gdal-dev at lists.osgeo.org> wrote:
>>>>>
>>>>>> We're also specifying the GEOM_POSSIBLE_NAMES, so it would be great
>>>>>> if with that option we could use the GEOMETRY_NAME without using the
>>>>>> CREATE_CSVT=YES option.
>>>>>>
>>>>>> Regarding emitting the .prj and .csvt in /vsistdout mode, that's why
>>>>>> I'm saying that there is an issue while generating the resultant CSV.
>>>>>> The way we see it is that when using the /vsistdout mode, the result
>>>>>> is a CSV file with the .prj information in the first line, and the .csvt in
>>>>>> the second line. We're dealing with the result deleting the first two lines
>>>>>> and using the rest of the content as a CSV, which should be equal to the
>>>>>> result obtained when using ogr2ogr without the CREATE_CSVT=YES option.
>>>>>> Probably we're losing something, but as we see it, the generated CSV
>>>>>> should be a valid one. Does that make sense?
>>>>>>
>>>>>> Thanks so much for your help!
>>>>>>
>>>>>> El mié, 3 may 2023 a las 15:10, Robert Hewlett (<rob.hewy at gmail.com>)
>>>>>> escribió:
>>>>>>
>>>>>>> The .CSVT and .PRJ help to make a proper geocsv dataset. Helps with
>>>>>>> QGIS And geopandas. The column name that I use in the CSV is usually geom
>>>>>>> and WKT shows up in the CSVT file which seems to be a one line file that
>>>>>>> hints at the data types in the CSV file.
>>>>>>>
>>>>>>> I hope that makes sense.
>>>>>>>
>>>>>>> CSVT
>>>>>>> Integer, Integer,WKT
>>>>>>>
>>>>>>> CSV
>>>>>>> line_id,point_id,geom
>>>>>>> 1,1,"POINT(1000 1000)"
>>>>>>>
>>>>>>> PRJ
>>>>>>> EPSG:26910
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 3, 2023, 05:23 Moises Calzado via gdal-dev <
>>>>>>> gdal-dev at lists.osgeo.org> wrote:
>>>>>>>
>>>>>>>> Hi Even,
>>>>>>>>
>>>>>>>> Thanks so much for taking a look into that one!
>>>>>>>>
>>>>>>>> I have one doubt regarding the CSVT content, as we're not really
>>>>>>>> using it, but it's required when using the GEOMETRY_NAME layer creation
>>>>>>>> option, as can be checked in the CSV driver documentation:
>>>>>>>>
>>>>>>>>
>>>>>>>>> -
>>>>>>>>>
>>>>>>>>> GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>>>>>>>>> column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>>>>>>>>>
>>>>>>>>> We really need this flag as we are processing files that contain
>>>>>>>> geometries with different column names, and we always want the same
>>>>>>>> geometry name in the generated output. Are we losing something when using
>>>>>>>> that flag to avoid this problem?
>>>>>>>> In my humble opinion, generating an invalid CSV when using the -lco
>>>>>>>> CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
>>>>>>>> strings containing line breaks can't be quoted.
>>>>>>>>
>>>>>>>> Could you please shed some light on this?
>>>>>>>>
>>>>>>>> Looking forward to your reply,
>>>>>>>> Regards.
>>>>>>>>
>>>>>>>> El mié, 3 may 2023 a las 14:00, Even Rouault (<
>>>>>>>> even.rouault at spatialys.com>) escribió:
>>>>>>>>
>>>>>>>>> you didn't post to the list
>>>>>>>>> Le 03/05/2023 à 13:49, Moises Calzado a écrit :
>>>>>>>>>
>>>>>>>>> Hi Even,
>>>>>>>>>
>>>>>>>>> Thanks so much for taking a look into that one!
>>>>>>>>>
>>>>>>>>> I have one doubt regarding the CSVT content, as we're not really
>>>>>>>>> using it, but it's required when using the GEOMETRY_NAME layer creation
>>>>>>>>> option, as can be checked in the CSV driver documentation:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> -
>>>>>>>>>>
>>>>>>>>>> GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry
>>>>>>>>>> column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
>>>>>>>>>>
>>>>>>>>>> We really need this flag as we are processing files that contain
>>>>>>>>> geometries with different column names, and we always want the same
>>>>>>>>> geometry name in the generated output. Are we losing something when using
>>>>>>>>> that flag to avoid this problem?
>>>>>>>>> In my humble opinion, generating an invalid CSV when using the
>>>>>>>>> -lco CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why
>>>>>>>>> strings containing line breaks can't be quoted.
>>>>>>>>>
>>>>>>>>> Could you please shed some light on this?
>>>>>>>>>
>>>>>>>>> Looking forward to your reply,
>>>>>>>>> Regards.
>>>>>>>>>
>>>>>>>>> El sáb, 29 abr 2023 a las 15:44, Even Rouault (<
>>>>>>>>> even.rouault at spatialys.com>) escribió:
>>>>>>>>>
>>>>>>>>>> Moises,
>>>>>>>>>>
>>>>>>>>>> as far as I can see with your example, the CSV driver behaves
>>>>>>>>>> "properly" in reading and writing of field values with line breaks.
>>>>>>>>>>
>>>>>>>>>> It follows the "Fields with embedded line breaks must be quoted"
>>>>>>>>>> rule of https://en.wikipedia.org/wiki/Comma-separated_values
>>>>>>>>>>
>>>>>>>>>> $ ogr2ogr out.csv /vsizip/dataframe.zip
>>>>>>>>>>
>>>>>>>>>> $ cat out.csv
>>>>>>>>>> id,descriptio
>>>>>>>>>> "1",This is my third row
>>>>>>>>>> "2","this is
>>>>>>>>>> my string
>>>>>>>>>> "
>>>>>>>>>> "3",This is my third row
>>>>>>>>>>
>>>>>>>>>> $ ogrinfo out.csv -al
>>>>>>>>>> INFO: Open of `out.csv'
>>>>>>>>>> using driver `CSV' successful.
>>>>>>>>>>
>>>>>>>>>> Layer name: out
>>>>>>>>>> Geometry: None
>>>>>>>>>> Feature Count: 3
>>>>>>>>>> Layer SRS WKT:
>>>>>>>>>> (unknown)
>>>>>>>>>> id: String (0.0)
>>>>>>>>>> descriptio: String (0.0)
>>>>>>>>>> OGRFeature(out):1
>>>>>>>>>> id (String) = 1
>>>>>>>>>> descriptio (String) = This is my third row
>>>>>>>>>>
>>>>>>>>>> OGRFeature(out):2
>>>>>>>>>> id (String) = 2
>>>>>>>>>> descriptio (String) = this is
>>>>>>>>>> my string
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> OGRFeature(out):3
>>>>>>>>>> id (String) = 3
>>>>>>>>>> descriptio (String) = This is my third row
>>>>>>>>>>
>>>>>>>>>> But in your example using /vsistdout/ and -lco CREATE_CSVT=YES is
>>>>>>>>>> going to result in an invalid CSV file which will mix both the .csvt and
>>>>>>>>>> .csv content
>>>>>>>>>>
>>>>>>>>>> Even
>>>>>>>>>> Le 24/04/2023 à 13:34, Moises Calzado via gdal-dev a écrit :
>>>>>>>>>>
>>>>>>>>>> Hello!
>>>>>>>>>>
>>>>>>>>>> We're trying to convert a Shapefile into a CSV using ogr2ogr and
>>>>>>>>>> we're having some issues while dealing with some columns that contain line
>>>>>>>>>> breaks inside their values. If we have a line with the following string,
>>>>>>>>>> ogr2ogr detects that the line break is a new line and it returns two lines.
>>>>>>>>>>
>>>>>>>>>> "this is my \n value"
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> That's the command that we're executing:
>>>>>>>>>>
>>>>>>>>>> ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/
>>>>>>>>>>> /vsizip/shapefile.zip -simplify 0.00001 -dim XY -t_srs EPSG:4326 -lco
>>>>>>>>>>> GEOMETRY=AS_WKT -lco GEOMETRY_NAME=geom -lco CREATE_CSVT=YES > result.csv
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Is this an expected behaviour, or is there any way to avoid this?
>>>>>>>>>> Sharing an example Shapefile so that you can try to reproduce
>>>>>>>>>> that behaviour:
>>>>>>>>>> https://drive.google.com/file/d/1gFqfTP02KTFoavJyyO-Ix05YwZB2tS24/view?usp=sharing
>>>>>>>>>>
>>>>>>>>>> Thanks so much in advance,
>>>>>>>>>> Regards.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> *Moises Calzado*
>>>>>>>>>>
>>>>>>>>>> Support Engineer
>>>>>>>>>>
>>>>>>>>>> +34671264286 | mcalzado at carto.com | CARTO
>>>>>>>>>> <https://www.carto.com/>
>>>>>>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> gdal-dev mailing listgdal-dev at lists.osgeo.orghttps://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>>>>>>
>>>>>>>>>> -- http://www.spatialys.com
>>>>>>>>>> My software is free, but my time generally not.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> *Moises Calzado*
>>>>>>>>>
>>>>>>>>> Support Engineer
>>>>>>>>>
>>>>>>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>>>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>>>>>>
>>>>>>>>> -- http://www.spatialys.com
>>>>>>>>> My software is free, but my time generally not.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> *Moises Calzado*
>>>>>>>>
>>>>>>>> Support Engineer
>>>>>>>>
>>>>>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>>>>> _______________________________________________
>>>>>>>> gdal-dev mailing list
>>>>>>>> gdal-dev at lists.osgeo.org
>>>>>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> gdal-dev mailing list
>>>>>>> gdal-dev at lists.osgeo.org
>>>>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> *Moises Calzado*
>>>>>>
>>>>>> Support Engineer
>>>>>>
>>>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>>>> _______________________________________________
>>>>>> gdal-dev mailing list
>>>>>> gdal-dev at lists.osgeo.org
>>>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>>
>>>>> _______________________________________________
>>>>> gdal-dev mailing list
>>>>> gdal-dev at lists.osgeo.org
>>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>>
>>>>
>>>>
>>>> --
>>>> *Moises Calzado*
>>>>
>>>> Support Engineer
>>>>
>>>> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
>>>> <https://spatial-data-science-conference.com/2023/london/>
>>>> _______________________________________________
>>>> gdal-dev mailing list
>>>> gdal-dev at lists.osgeo.org
>>>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>>>
>>> _______________________________________________
>> gdal-dev mailing list
>> gdal-dev at lists.osgeo.org
>> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>>
>
>
> --
> *Moises Calzado*
>
> Support Engineer
>
> +34671264286 | mcalzado at carto.com | CARTO <https://www.carto.com/>
> <https://spatial-data-science-conference.com/2023/london/>
> _______________________________________________
> gdal-dev mailing list
> gdal-dev at lists.osgeo.org
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osgeo.org/pipermail/gdal-dev/attachments/20230504/7f6d0732/attachment-0001.htm>
More information about the gdal-dev
mailing list