[GRASS-dev] [GRASS GIS] #2692: v.in.ascii does not handle text in qoutes

GRASS GIS trac at osgeo.org
Thu Jun 18 00:54:29 PDT 2015


#2692: v.in.ascii does not handle text in qoutes
-------------------------+-------------------------------------------------
  Reporter:  wenzeslaus  |      Owner:  grass-dev@…
      Type:  defect      |     Status:  new
  Priority:  normal      |  Milestone:  7.0.1
 Component:  Default     |    Version:  svn-trunk
Resolution:              |   Keywords:  CSV, doublequote, singlequote, text
       CPU:              |  delimiter
  Unspecified            |   Platform:  Unspecified
-------------------------+-------------------------------------------------

Comment (by mlennert):

 Replying to [comment:6 glynn]:
 > Replying to [comment:3 glynn]:
 >
 > > For that, an explicit state machine is likely to be more legible than
 ad-hoc logic.
 >
 > Please test attachment:tokenise.diff

 Great, thanks !

 I propose two small changes (attached tokenise_corrected.diff), one seems
 just a typo (in case A_END_RECORD: "*q++ - '\0';") and the other comes
 from the fact that when we are in state AFTER_QUOTE and we reach a
 delimiter, we have to go back to state S_START. Otherwise if the next
 field starts again with a quote, this quote is treated as a second quote.

 Using the following example:


 {{{
 echo "123|123|1|test1|'test2'|'\"test3\"'|'test''4'"  | v.in.ascii in=-
 out=testtext text=singlequote --o
 }}}


 With your patch:


 {{{
 > v.db.select testtextcat|int_1|int_2|int_3|str_1|str_2|str_3|str_4
 1|123|123|1|test1|test2|'"test3"|'test'4t''4'
 }}}

 With the correction:


 {{{
 > v.db.select testtextcat|int_1|int_2|int_3|str_1|str_2|str_3|str_4
 1|123|123|1|test1|test2|"test3"|test'4
 }}}



 >
 > An external library may be worth using for improved fault-tolerance (CSV
 is a rather loose "standard", to say the least). But any such dependency
 should be
 > a. on specific modules (e.g. v.in.ascii), not lib/gis (i.e. G_tokenize),
 and
 > b. an optional alternative to G_tokenize(), i.e. modules should still
 compile and work if the library isn't available.
 >
 > Python is far too heavyweight a dependency for such a task.

 Well, I thought about a new module v.in.ascii2/v.in.csv which would be
 based on the Python csv module. As Python is a dependency anyway so on
 module level, this shouldn't be a problem. But I think that with your
 patch this particular bug is solved, and that we can leave handling of
 more complex csv files to other tools which people can use to prepare the
 data for v.in.ascii.

--
Ticket URL: <https://trac.osgeo.org/grass/ticket/2692#comment:7>
GRASS GIS <http://grass.osgeo.org>



More information about the grass-dev mailing list